Regex101
Regex101 copied to clipboard
Inline python re modifiers not working
Bug Description
The regex
#define\s+(?i:CONFIGXML_HEADER)
reports:
(? Incomplete group structure
) Incomplete group structure
However, it is a valid regular expression in python 3.9 and possibly others meaning that I need #define
as case sensitive but CONFIGXML_HEADER
case insensitive.
Reproduction steps
Paste the above regex into the regex field on the site.
Expected Outcome
Partially case sensitive regex.
Browser
Chrome
OS
Windows 10
This is a longer standing issue, the website caters for Python 2.7, which is very outdated at this point. I will have to rework it completely to support Python 3+ ASAP.
In this context it is probably worth mentioning that all official support for Python 2.x ended 01/01/2020.
@SteveBarnes-BH Is there a writeup somewhere outlining the regex differences between 2.7 and 3.x?
https://docs.python.org/3/library/re.html has:
(?aiLmsux-imsx:...) (Zero or more letters from the set 'a', 'i', 'L', 'm', 's', 'u', 'x', optionally followed by '-' followed by one or more letters from the 'i', 'm', 's', 'x'.) The letters set or remove the corresponding flags: re.A (ASCII-only matching), re.I (ignore case), re.L (locale dependent), re.M (multi-line), re.S (dot matches all), re.U (Unicode matching), and re.X (verbose), for the part of the expression. (The flags are described in Module Contents.)
The letters 'a', 'L' and 'u' are mutually exclusive when used as inline flags, so they can’t be combined or follow '-'. Instead, when one of them appears in an inline group, it overrides the matching mode in the enclosing group. In Unicode patterns (?a:...) switches to ASCII-only matching, and (?u:...) switches to Unicode matching (default). In byte pattern (?L:...) switches to locale depending matching, and (?a:...) switches to ASCII-only matching (default). This override is only in effect for the narrow inline group, and the original matching mode is restored outside of the group.
New in version 3.6.
Changed in version 3.7: The letters 'a', 'L' and 'u' also can be used in a group.
The How To is a useful resource as well.
There is also the significant difference that you can have string, (i.e. Unicode), or byte regular expressions and also targets and that the 2 don't mix, i.e. re.findall("Fred", b"Fred") will cause a error, (TypeError: cannot use a string pattern on a bytes-like object
), but I would suggest this is probably best just being a comment on your site rather than trying to deal with it.
@firasdib have you managed to make any progress on this issue?
Either way, the name of the "Python" flavor should probably be "Python 2.7", to make sure users understand that Python 3 syntax is not supported.
I would agree with the statement that you should make it very clear that this is python 2.7.
I removed the rest of this because the issue was that I was not using raw strings and the \b
was being interpreted as the backspace rather than the word boundary escape sequence.