Introduction
- Regular expressions are a way of describing patterns in text.
- Most text editors and many other tools include a regular expression engine for performing these kinds of searches.
- Regular expressions are often offered as a mode of find/replace that can be turned on and off by the user.
Regex Fundamentals
- Wrap characters in
[]
to define a set of valid matches for a given position. - Use
-
between two characters to define a range of characters to match. -
^
at the start of a set to invert it, indicating that the given characters should be excluded from a match.
Tokens and Wildcards
- Use the
\b
token to match a word boundary, and^
and$
to match the beginning and end of a line respectively. -
\\
has special meaning in regular expressions, and\\\\
should be used to specify a literal backslash in a pattern. -
.
describes a position that could match any character. - When composing a regular expression, it is good practice to be as specific as possible about what you want to match.
Repeated Matches
-
?
indicates that the preceding character or set should be treated as optional in this position. -
*
indicates that the preceding character or set should appear 0 or more times in this position. -
+
indicates that the preceding character or set should appear 1 or more times in this position. -
{2,4}
indicates that the preceding character or set should appear at least twice but no more than four times in this position.
Capture Groups and References
- Capture groups are defined within
()
in a regular expression. - The left-most capture group in a regular expression is referred to
with
\\1
in the replacement string, the next with\\2
, and so on.
Alternative Matches
- Alternative strings to match can be combined with
|
.