Introduction


  • Regular expressions are a way of describing patterns in text.
  • Most text editors and many other tools include a regular expression engine for performing these kinds of searches.
  • Regular expressions are often offered as a mode of find/replace that can be turned on and off by the user.

Regex Fundamentals


  • Wrap characters in [] to define a set of valid matches for a given position.
  • Use - between two characters to define a range of characters to match.
  • ^ at the start of a set to invert it, indicating that the given characters should be excluded from a match.

Tokens and Wildcards


  • Use the \b token to match a word boundary, and ^ and $ to match the beginning and end of a line respectively.
  • \\ has special meaning in regular expressions, and \\\\ should be used to specify a literal backslash in a pattern.
  • . describes a position that could match any character.
  • When composing a regular expression, it is good practice to be as specific as possible about what you want to match.

Repeated Matches


  • ? indicates that the preceding character or set should be treated as optional in this position.
  • * indicates that the preceding character or set should appear 0 or more times in this position.
  • + indicates that the preceding character or set should appear 1 or more times in this position.
  • {2,4} indicates that the preceding character or set should appear at least twice but no more than four times in this position.

Capture Groups and References


  • Capture groups are defined within () in a regular expression.
  • The left-most capture group in a regular expression is referred to with \\1 in the replacement string, the next with \\2, and so on.

Alternative Matches


  • Alternative strings to match can be combined with |.