PyTeaser icon indicating copy to clipboard operation
PyTeaser copied to clipboard

Not sure if this regex does what you want

Open LouisMichael opened this issue 5 years ago • 1 comments

I was looking over project regex and I noticed this one. It seems to try to look for punctuation and is trying to use \p{} for unicode but this is not a feature supported in python. It also might be dead code in general.

LouisMichael avatar Feb 19 '19 02:02 LouisMichael

Quick test:

$ cat t.py
import re

PUNCTUATION = re.compile("[^\\p{Ll}\\p{Lu}\\p{Lt}\\p{Lo}\\p{Nd}\\p{Pc}\\s]")
print("regex: " + PUNCTUATION.pattern)
for char in ["a", "p", "!"]:
	if PUNCTUATION.search(char):
		print("Matches a '{}'".format(char))
	else:
		print("Does not match a '{}'".format(char))

$ python3 t.py
regex: [^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}\s]
Matches a 'a'
Does not match a 'p'
Matches a '!'

It matches "a" but not "p" because the character class is looking for NOT "p { L l ...".

davisjam avatar Feb 19 '19 02:02 davisjam