PyVCF icon indicating copy to clipboard operation
PyVCF copied to clipboard

Fix invalid escape sequences in regex strings

Open DavidCain opened this issue 2 years ago • 0 comments

Summary

This commit fixes deprecation warnings that arise from using backslashes in strings, but not as part of an escape sequence. It will help this library be used with newer versions of Python.

String literals do not change (for current versions of Python)

>>> r'[\[\]]' == '[\[\]]'
True

Examples

$ python -Wd -c 'print("\d")'
DeprecationWarning: invalid escape sequence \d
$ python -W error -c 'print("\d")'
SyntaxError: invalid escape sequence \d

Explanation

For an explanation of the problem (and the recommended solution), see: https://docs.python.org/3/library/re.html

Also, please note that any invalid escape sequences in Python’s usage of the backslash in string literals now generate a DeprecationWarning and in the future this will become a SyntaxError. This behaviour will happen even if it is a valid escape sequence for a regular expression.

The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'.

How to keep these errors from source code

I didn't make any proposed changes in this commit, but there are a few ways to make sure that new invalid escape sequences are not used:

  • Use a linter!
    • pylint has anomalous-backslash-in-string
    • flake8 has W605
    • other linters work too!
  • Escalate deprecation warnings to full errors at test time (e.g. error:invalid escape sequence:DeprecationWarning with filterwarnings will change these warnings to errors)

DavidCain avatar Aug 01 '22 21:08 DavidCain