Regex101
Regex101 copied to clipboard
Incorrect Explanation for \W (python)
Bug Description
For Python 2.7 when I enter into the Regular expression bar r"\W" I am told in the explanation field that "\W matches any non-word character (equivalent to [^a-zA-Z0-9_])" This is not true because é, for example, does not match "\W", and it does for "[^a-zA-Z0-9_]"
Reproduction steps
Enter "\W" in the expression field
In the test field enter é
Change the expression field to "[^a-zA-Z0-9_]"
Observe that "\W" does not highlight é and "[^a-zA-Z0-9_]" does.
Expected Outcome
The explanation should make note that "\W" is not equivalent to "[^a-zA-Z0-9_]" in all cases, particularly those dealing with accented characters commonly found in other languages.
Browser
Include browser name and version Chrome latest (89.0)
OS
Include OS name and version Big Sur
You're entirely correct, this is due to python being emulated by PCRE on the website, a better 'test' is to use \w, which matches é by default (read: without the /u modifier), this means that python is always getting emulated with the unicode switch of pcre
For pcre the website gives for \W with /u "\W match any non-word character in any script (equivalent to [^\p{L}\p{N}_])" which is probably correct, which python 2.7 does not seem to support, it would be not ideal to list all of the character/code point which would match \W in unicode though if it can't be done in terms of \p{}, not sure about newer version of python anyway, this can be fixed easily but note that python needs some love overall on the website.