Introduction
- Regular expressions are a way of describing patterns in text.
- Most text editors and many other tools include a regular expression engine for performing these kinds of searches.
- Regular expressions are often offered as a mode of find/replace that can be turned on and off by the user.
Regex Fundamentals
- Wrap characters in
[]to define a set of valid matches for a given position. - Use
-between two characters to define a range of characters to match. -
^at the start of a set to invert it, indicating that the given characters should be excluded from a match.
Tokens and Wildcards
- Use the
\btoken to match a word boundary, and^and$to match the beginning and end of a line respectively. -
\\has special meaning in regular expressions, and\\\\should be used to specify a literal backslash in a pattern. -
.describes a position that could match any character. - When composing a regular expression, it is good practice to be as specific as possible about what you want to match.
Repeated Matches
-
?indicates that the preceding character or set should be treated as optional in this position. -
*indicates that the preceding character or set should appear 0 or more times in this position. -
+indicates that the preceding character or set should appear 1 or more times in this position. -
{2,4}indicates that the preceding character or set should appear at least twice but no more than four times in this position.
Capture Groups and References
- Capture groups are defined within
()in a regular expression. - The left-most capture group in a regular expression is referred to
with
\\1in the replacement string, the next with\\2, and so on.
Alternative Matches
- Alternative strings to match can be combined with
|.