Regex101 icon indicating copy to clipboard operation
Regex101 copied to clipboard

Inline python re modifiers not working

Open SteveBarnes-BH opened this issue 3 years ago • 6 comments

Bug Description

The regex #define\s+(?i:CONFIGXML_HEADER) reports:

(? Incomplete group structure
) Incomplete group structure

However, it is a valid regular expression in python 3.9 and possibly others meaning that I need #define as case sensitive but CONFIGXML_HEADER case insensitive.

image

Reproduction steps

Paste the above regex into the regex field on the site.

Expected Outcome

Partially case sensitive regex.

Browser

Chrome

OS

Windows 10

SteveBarnes-BH avatar Jan 11 '22 15:01 SteveBarnes-BH

This is a longer standing issue, the website caters for Python 2.7, which is very outdated at this point. I will have to rework it completely to support Python 3+ ASAP.

firasdib avatar Jan 11 '22 17:01 firasdib

In this context it is probably worth mentioning that all official support for Python 2.x ended 01/01/2020.

SteveBarnes-BH avatar Jan 17 '22 08:01 SteveBarnes-BH

@SteveBarnes-BH Is there a writeup somewhere outlining the regex differences between 2.7 and 3.x?

firasdib avatar Jan 17 '22 08:01 firasdib

https://docs.python.org/3/library/re.html has:

(?aiLmsux-imsx:...) (Zero or more letters from the set 'a', 'i', 'L', 'm', 's', 'u', 'x', optionally followed by '-' followed by one or more letters from the 'i', 'm', 's', 'x'.) The letters set or remove the corresponding flags: re.A (ASCII-only matching), re.I (ignore case), re.L (locale dependent), re.M (multi-line), re.S (dot matches all), re.U (Unicode matching), and re.X (verbose), for the part of the expression. (The flags are described in Module Contents.)

The letters 'a', 'L' and 'u' are mutually exclusive when used as inline flags, so they can’t be combined or follow '-'. Instead, when one of them appears in an inline group, it overrides the matching mode in the enclosing group. In Unicode patterns (?a:...) switches to ASCII-only matching, and (?u:...) switches to Unicode matching (default). In byte pattern (?L:...) switches to locale depending matching, and (?a:...) switches to ASCII-only matching (default). This override is only in effect for the narrow inline group, and the original matching mode is restored outside of the group.

New in version 3.6.

Changed in version 3.7: The letters 'a', 'L' and 'u' also can be used in a group.

The How To is a useful resource as well.

There is also the significant difference that you can have string, (i.e. Unicode), or byte regular expressions and also targets and that the 2 don't mix, i.e. re.findall("Fred", b"Fred") will cause a error, (TypeError: cannot use a string pattern on a bytes-like object), but I would suggest this is probably best just being a comment on your site rather than trying to deal with it.

SteveBarnes-BH avatar Jan 17 '22 08:01 SteveBarnes-BH

@firasdib have you managed to make any progress on this issue?

Either way, the name of the "Python" flavor should probably be "Python 2.7", to make sure users understand that Python 3 syntax is not supported.

thesuperzapper avatar May 31 '22 07:05 thesuperzapper

I would agree with the statement that you should make it very clear that this is python 2.7.

I removed the rest of this because the issue was that I was not using raw strings and the \b was being interpreted as the backspace rather than the word boundary escape sequence.

weallcock avatar Nov 20 '22 20:11 weallcock