Comments

Usually regular expressions are very long and could be almost unreadable. Even if you wrote it by yourself sometime ago. In such cases comments are very useful. To use it you need to turn on free-spacing mode by adding x modifier to your expression. Comment starts with a hashtag #. Let’s parse date in this format: YYYY/MM/DD (year should be between 1000 and 2012 and assume all months have 30 days).

(1\d{3}|20(0\d|1[0-2]))\/(0[1-9]|1[0-2])\/(0[1-9]|[012]\d|30)

And the same regex with comments:

(1\d{3}          # 1000-1999
|                # or
20(0\d|1[0-2]))  # 20 (01-09 or 10-12)
\/               # slash, months after
(0[1-9]          # 01-09
|                # or     
1[0-2])          # 10-12 
\/               # slash, days after
(0[1-9]          # 01-09
|                # or
[12]\d           # 10-29
|                # or 
30)              # 30

Lookahead assertions

Sometimes it’s needed to match something which is followed by something else. But not include this following something else in the match. For such cases lookahead assertion is what you need. There are two types of lookahead assertion:

  • positive (?=regex) - matches only if it is followed by regex
  • negative (?!regex) - matches if it is followed NOT by regex

Let’s have a look on a simple example.

Let’s say you have a list of files:

file.sass
file.xml
index.html
model.xml
model.properties
contentModel.xml

and you want to find all names of xml files but do not include the .xml part. Of course you can use simple pattern ^[a-z]+\.xml$ but then the .xml part will be included in the match. To have only filename use this pattern: ^[a-z]+(?=.xml$).

Some explanation:

^       #beginning of the line
[a-z]+  #file name - one or more letters
(?=     #assertion declaration
.xml$)  #part which should be after the filename

You can play with it here: https://regex101.com/r/tK4lU3/1

Result would be following:

file
model
contentModel

Lookbehind assertions

Lookbehind assertion works the same way but looks before the current match. And there are two types of it as well: positive (?<=regex) and negative (?<!regex).

Almost the same example, but now let’s find all the file extensions we have for filename model. Pattern for it would be: (?<=^model\.)[a-z]+$ and the result:

xml
properties

If-then-else conditions

To check some conditions regular expressions provide this functionality. The syntax is (?(?=regex)then|else) or if you don’t have else just (?(?=regex)then).