Summary: in this tutorial, you’ll learn about Python regex lookbehind and negative lookbehind.
Introduction to the Python regex lookbehind #
In regular expressions, the lookbehind matches an element if there is another specific element before it. The lookbehind has the following syntax:
(?<=Y)X
In this syntax, the pattern will match X
if there is Y
before it.
For example, suppose you have the following string and want to match the number 500
not the number 1
:
'1 phone costs $500'
Code language: JavaScript (javascript)
To do that, you can use the following regular expression with a lookahead like this:
(?<=\$)\d+
In this pattern:
(?<=\$)
matches an element if there is a literal string$
before it. Since the$
is a special character in the regex, we use the backslash character\
to escape it. As a result, the regex engine will treat\$
as a regular character$
.\d+
matches one or more digits.
The following example uses a regular expression with a lookbehind to match a number that has the $
sign before it:
import re
s = '1 phone costs $500'
pattern = '(?<=\$)\d+'
matches = re.finditer(pattern, s)
for match in matches:
print(match.group())
Code language: JavaScript (javascript)
Output:
500
Negative lookbehind #
The negative lookbehind has the following syntax:
(?<!Y)X
This pattern matches X
if there is no Y
before it.
The following example uses a negative lookbehind to match a number that doesn’t have the $
sign before it:
import re
s = '1 phone costs $500'
pattern = r'\b(?<!\$)\d+\b'
matches = re.finditer(pattern, s)
for match in matches:
print(match.group())
Code language: JavaScript (javascript)
Output:
1
In the regular expression:
r'\b(?<!\$)\d+\b'
Code language: JavaScript (javascript)
- The
\b
matches the word boundary. - The
(?<!\$)
is a negative lookbehind that does not match the$
sign. - The
\d+
matches a number with one or more digits.
Summary #
- A lookbehind
(?<!Y)X
matchesX
only if there is elementY
before it. - A negative lookbehind
(?<!Y)X
matchesX
only if there’s no elementY
before it.