I'm still reeling from the scathing attacks on Perl.
Well, it is what it is.
...but still use a little bit for manipulating text files, eg collating several csv files into one single csv that I use in visualisation tools.
No other scripting language is better at manipulating text. This is Perl's strong suit.
As a scripting language, I find it is the easiest to just write and run of the ones I know.
Perl fulfills the role for Unix that REXX does in the IBM mainframe world. And I too will write a 50-line Perl script to automate some simple task, or as an
ad hoc solution. With a little experience, everyone can keep enough Perl in their heads to quick-draw it like a gun and get past some obstacle.
The fact that Perl has clear strengths doesn't stop it from having crippling weaknesses, and from being grossly misused. It's the weakness and misuse that we're railing against. And part of that problem is the strong Perl evangelism community that tends to respond to clear failures in the language as if they were shortcomings in programmer skill and knowledge. That's a pretty entrenched defense.
I love regular expressions. I've used them now in a number of languages and they fit the most organically into Perl.
Regular expressions are the heart of text processing in any language. Their elementary integration into Perl is one of its strengths. The propensity to compose write-only (i.e., forever thereafter illegible) regular expressions to implement some rule is one of the weaknesses, not of the language but of how it's commonly used.
I'm still not clear of the whole meaning of greedy and global though.
Greedy means to match the longest possible string in the input sentence, for any one invocation of the matching algorithm. This is most often what you want, and what Perl does by default. However there are real-world cases where the most straightforward-looking regular expression doesn't do what you want.
Consider the task of extracting all the HTML tags from a sentence. A tag is an open angle bracket "<" followed possibly by the negation qualifier "/", followed by some upper or lower case text, followed by the closing angle bracket ">". So a viable regular expression in the classical syntax would be '</?[a-zA-Z]+]>' read as "< followed by zero or one / followed by one or more alphabetic characters followed by >". But in fact HTML tags may contain qualifiers that in turn may contain arbitrary text. So you'd be tempted to expand the meat of the expression as '</?.+>' thinking that the engine will stop accepting the overspecified "." (i.e., match any character) when it sees the closed bracket >.
But in the marked-up sentence
We want to succeedyou want your expression to match
We >want to succeedbut under greedy rules it will match
We want to succeedbecause the first > in the initial
tag also matches under the '.*' element and the algorithm detects a longer matchable string, ending with the final > after the tag.
The greediness question is therefore one of affinity where departure from one parsing state is ambiguous. In the world of actual machines, think of the interaction between your turn signals and hazard lights. When your hazard lights are on, the turn signals don't work. The engineers specifically gave greater "affinity" (or precedence, if you prefer) to the hazard lights when resolving contention for the control signal. Similarly you can choose whether your regular expression engine will give greater affinity to the continuing case or to a succeeding case. Greedy means the continuing case has greater affinity.
A better way to write the expression is '</?[^>]+>' which means "< followed by an optional / followed by one or more characters that aren't >, followed by >". While better suited to the task, it gets criticism because it's not very clear. And these expressions become even less clear when the expression gets more complicated. What, for example, would the expression look like when you recall that > may legitimately appear inside a string literal that's an argument to a qualifier?
Global means that an input sentence may contain several distinct matchable strings (i.e., but separated by non-matching symbols); a global match returns a list of matching substrings or allows you to restart the search where the previous one left off, and a global replacement replaces all matching substrings rather than just the first. In my example above, the sentence contains two substrings that match the simple HTML tag expression. But the desirable non-greedy match will only catch the first one. A global match would require us to deal with a set of plural matches, or the ability to restart the search on the substring "Want..." so that we would catch the closing tag. For regex-based rewrites, the "g" qualifier says "replace all matching substrings, not just the first."
Ease of using hashes is also neat.
You can argue that all possible data structures can be composed using only a container and an association list, which is the theory behind Perl data structures. However, Perl's array-flattening pretty much eliminates that. And when you add references, composing any meaningful data structure in Perl is a nightmare of dereferences, delimiters, and context-changing coersions. That said, the simple hash is your friend.
Why no boolean though?
Because in Perl it's a context, not a data type. Despite all the other contexts that also have associated intrinsic data types.