packaging
packaging copied to clipboard
`Version` and `Specifier` accept (erroneously) some non-ASCII letters in the *local version* segment
Reproducing the behavior concerning packaging.version.Version
:
Python 3.9.7 (default, Oct 4 2021, 18:09:29)
[...]
>>> import packaging.version
>>> packaging.version.Version('1.2+\u0130\u0131\u017f\u212a')
<Version('1.2+i̇ıſk')>
The cause is that packaging.version.VERSION_PATTERN
makes use of a-z
character ranges in conjunction with re.IGNORECASE
and (implicit in Python 3.x) re.UNICODE
(see the 2nd paragraph of this fragment: https://docs.python.org/3/library/re.html#re.IGNORECASE).
It can be fixed in one of the following two ways:
-
either by adding
re.ASCII
to flags (but then both occurrences of\s*
in the actual regex will be restricted to match ASCII-only whitespace characters!); -
or by removing
re.IGNORECASE
from flags and replacing (inVERSION_PATTERN
) both occurrences ofa-z
withA-Za-z
plus adding suitable upper-case alternatives in thepre_l
,post_l
anddev_l
regex groups, e.g.,[aA][lL][pP][hH][aA]
in place ofalpha
(quite cumbersome...).
The whitespace issue can probably be worked around by doing that detection separately from the actual parsing.
class Version:
_regex = re.compile(VERSION_PATTERN, re.VERBOSE | re.IGNORECASE | re.ASCII)
def __init__(self, version: str) -> None:
match = self._regex.match(version.strip())
# ... The rest is the same ...
Reproducing the behavior concerning packaging.specifiers.Specifier
:
Python 3.9.7 (default, Oct 4 2021, 18:09:29)
[...]
>>> import packaging.specifiers
>>> packaging.specifiers.Specifier('==1.2+\u0130\u0131\u017f\u212a')
<Specifier('==1.2+İıſK')>
The cause is the same as the aforementioned Version
-related one, except that here it relates to (non-public) Specifier._regex_str
and Specifier._regex
regular expression definitions.
In the case of these regexes, \s
occurs in various places of the effective pattern (not only at its start and end), so here the solution based on re.ASCII
in conjunction with the .strip()
-based value preparation (proposed above by @uranusjr in regard to Version
) cannot be applied without restricting matching of white space characters (including the negated character range in the case of the ===
operator...) to ASCII-only ones; which -- I suppose -- would be too disruptive.
Instead of that, an additional check can be performed when the operator is not '==='
-- something along the lines of:
without_whitespace = ''.join(spec.split())
if not without_whitespace.isascii():
raise InvalidSpecifier...