regression on python 3.12 UndefinedUnitError for string `−66.11*10**-62`
import pint
ureg = pint.UnitRegistry()
print(ureg('−66.11*10**-62'))
on Python 3.11.9
6.611e-61
on Python 3.12.4
Traceback (most recent call last):
File "src/temp.py", line 7, in <module>
print(ureg('−66.11*10**-62'))
^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.12/site-packages/pint/facets/plain/registry.py", line 1398, in parse_expression
return build_eval_tree(gen).evaluate(_define_op)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.12/site-packages/pint/pint_eval.py", line 383, in evaluate
self.left.evaluate(define_op, bin_op, un_op),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.12/site-packages/pint/pint_eval.py", line 383, in evaluate
self.left.evaluate(define_op, bin_op, un_op),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.12/site-packages/pint/pint_eval.py", line 395, in evaluate
return define_op(self.left)
^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.12/site-packages/pint/facets/plain/registry.py", line 1396, in _define_op
return self._eval_token(s, case_sensitive=case_sensitive, **values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.12/site-packages/pint/facets/plain/registry.py", line 1305, in _eval_token
{self.get_name(token_text, case_sensitive=case_sensitive): 1}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.12/site-packages/pint/facets/plain/registry.py", line 658, in get_name
raise UndefinedUnitError(name_or_alias)
pint.errors.UndefinedUnitError: '−66' is not defined in the unit registry
I'm using the latest version of pint (0.24.1).
Ah looks like the pint_eval.tokenizer(input_string) on line 1392 of pint/facets/plain/registry.py returns different results
Python 3.11 (working)
[TokenInfo(type=60 (ERRORTOKEN), string='−', start=(1, 0), end=(1, 1), line='−66.11*10**-62'), TokenInfo(type=2 (NUMBER), string='66.11', start=(1, 1), end=(1, 6), line='−66.11*10**-62'), TokenInfo(type=54 (OP), string='*', start=(1, 6), end=(1, 7), line='−66.11*10**-62'), TokenInfo(type=2 (NUMBER), string='10', start=(1, 7), end=(1, 9), line='−66.11*10**-62'), TokenInfo(type=54 (OP), string='**', start=(1, 9), end=(1, 11), line='−66.11*10**-62'), TokenInfo(type=54 (OP), string='-', start=(1, 11), end=(1, 12), line='−66.11*10**-62'), TokenInfo(type=2 (NUMBER), string='62', start=(1, 12), end=(1, 14), line='−66.11*10**-62'), TokenInfo(type=4 (NEWLINE), string='', start=(1, 14), end=(1, 15), line=''), TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')]
Python 3.12 (not working)
[TokenInfo(type=1 (NAME), string='−66', start=(1, 0), end=(1, 3), line='−66.11*10**-62'), TokenInfo(type=2 (NUMBER), string='.11', start=(1, 3), end=(1, 6), line='−66.11*10**-62'), TokenInfo(type=55 (OP), string='*', start=(1, 6), end=(1, 7), line='−66.11*10**-62'), TokenInfo(type=2 (NUMBER), string='10', start=(1, 7), end=(1, 9), line='−66.11*10**-62'), TokenInfo(type=55 (OP), string='**', start=(1, 9), end=(1, 11), line='−66.11*10**-62'), TokenInfo(type=55 (OP), string='-', start=(1, 11), end=(1, 12), line='−66.11*10**-62'), TokenInfo(type=2 (NUMBER), string='62', start=(1, 12), end=(1, 14), line='−66.11*10**-62'), TokenInfo(type=4 (NEWLINE), string='', start=(1, 14), end=(1, 15), line='−66.11*10**-62'), TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')]
I'm debugging this, because this is a failing test in the unit_parse library (that depends on pint), but looking closer, the Python 3.11 produced an invalid answer, positive 66, instead of negative 66. So while the behaviour has changed, it is at least not misleading.
ok, and after even more debugging, I realise the minus sign is not ascii, but a unicode dash, \u2212... So previously pint would silently skip the dash, now it's more vocal that there is an issue. This is all because Python 3.12 subtly changed the tokeniser.
So I think this is a non-issue. Feel free to close if you agree.