django-brake icon indicating copy to clipboard operation
django-brake copied to clipboard

Fixup regex syntax

Open blag opened this issue 1 month ago • 0 comments

Use raw strings when specifying regexes to ensure that special sequences are not interpreted as escape sequences.

Additionally, while they both utilize the backslash \, escape sequences (\n, \t, etc.) are different than special sequences (\w, \d, etc.).

Special sequences are already character sequences - they are essentially shortcuts to explicit character sequences.

In other words, in a regex, \d == [0-9] 1, and \w == [a-zA-Z0-9_] 1.

Since special sequences are already sequences, they do not need to be contained in [] in regexes, unless they are being combined with other special sequences.

This PR fixes and cleans up the regex syntax, since Python 3.12 is complaining about invalid escape characters:

/usr/local/lib/python3.12/site-packages/brake/decorators.py:25: SyntaxWarning: invalid escape sequence '\d'
  rate_re = re.compile('([\d]+)/([\d]*)([smhd])')

Here's some proof from Python itself:

>>> import re
>>> re.compile('\d+').match("1234") is not None
<stdin>:1: SyntaxWarning: invalid escape sequence '\d'
True
>>> re.compile(r'\d+').match("1234") is not None
True
>>> re.compile(r'[\d]+').match("1234") is not None
True
>>> re.compile(r'[\d]+').match("abc1234xyz") is not None
False
>>> re.compile(r'\d+').match("abc1234xyz") is not None
False

More information on raw strings, escape sequences, and special sequences can be found here: https://docs.python.org/3/library/re.html

1 Ignoring non-ASCII Unicode characters here for the sake of brevity.

blag avatar May 17 '24 18:05 blag