pint icon indicating copy to clipboard operation
pint copied to clipboard

regression on python 3.12 UndefinedUnitError for string `−66.11*10**-62`

Open bramp opened this issue 1 year ago • 3 comments

import pint

ureg = pint.UnitRegistry()
print(ureg('−66.11*10**-62'))

on Python 3.11.9

6.611e-61

on Python 3.12.4

Traceback (most recent call last):
  File "src/temp.py", line 7, in <module>
    print(ureg('−66.11*10**-62'))
          ^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/pint/facets/plain/registry.py", line 1398, in parse_expression
    return build_eval_tree(gen).evaluate(_define_op)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/pint/pint_eval.py", line 383, in evaluate
    self.left.evaluate(define_op, bin_op, un_op),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/pint/pint_eval.py", line 383, in evaluate
    self.left.evaluate(define_op, bin_op, un_op),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/pint/pint_eval.py", line 395, in evaluate
    return define_op(self.left)
           ^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/pint/facets/plain/registry.py", line 1396, in _define_op
    return self._eval_token(s, case_sensitive=case_sensitive, **values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/pint/facets/plain/registry.py", line 1305, in _eval_token
    {self.get_name(token_text, case_sensitive=case_sensitive): 1}
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/pint/facets/plain/registry.py", line 658, in get_name
    raise UndefinedUnitError(name_or_alias)
pint.errors.UndefinedUnitError: '−66' is not defined in the unit registry

I'm using the latest version of pint (0.24.1).

bramp avatar Jun 30 '24 02:06 bramp

Ah looks like the pint_eval.tokenizer(input_string) on line 1392 of pint/facets/plain/registry.py returns different results

Python 3.11 (working)

[TokenInfo(type=60 (ERRORTOKEN), string='−', start=(1, 0), end=(1, 1), line='−66.11*10**-62'), TokenInfo(type=2 (NUMBER), string='66.11', start=(1, 1), end=(1, 6), line='−66.11*10**-62'), TokenInfo(type=54 (OP), string='*', start=(1, 6), end=(1, 7), line='−66.11*10**-62'), TokenInfo(type=2 (NUMBER), string='10', start=(1, 7), end=(1, 9), line='−66.11*10**-62'), TokenInfo(type=54 (OP), string='**', start=(1, 9), end=(1, 11), line='−66.11*10**-62'), TokenInfo(type=54 (OP), string='-', start=(1, 11), end=(1, 12), line='−66.11*10**-62'), TokenInfo(type=2 (NUMBER), string='62', start=(1, 12), end=(1, 14), line='−66.11*10**-62'), TokenInfo(type=4 (NEWLINE), string='', start=(1, 14), end=(1, 15), line=''), TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')]

Python 3.12 (not working)

[TokenInfo(type=1 (NAME), string='−66', start=(1, 0), end=(1, 3), line='−66.11*10**-62'), TokenInfo(type=2 (NUMBER), string='.11', start=(1, 3), end=(1, 6), line='−66.11*10**-62'), TokenInfo(type=55 (OP), string='*', start=(1, 6), end=(1, 7), line='−66.11*10**-62'), TokenInfo(type=2 (NUMBER), string='10', start=(1, 7), end=(1, 9), line='−66.11*10**-62'), TokenInfo(type=55 (OP), string='**', start=(1, 9), end=(1, 11), line='−66.11*10**-62'), TokenInfo(type=55 (OP), string='-', start=(1, 11), end=(1, 12), line='−66.11*10**-62'), TokenInfo(type=2 (NUMBER), string='62', start=(1, 12), end=(1, 14), line='−66.11*10**-62'), TokenInfo(type=4 (NEWLINE), string='', start=(1, 14), end=(1, 15), line='−66.11*10**-62'), TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')]

bramp avatar Jun 30 '24 02:06 bramp

I'm debugging this, because this is a failing test in the unit_parse library (that depends on pint), but looking closer, the Python 3.11 produced an invalid answer, positive 66, instead of negative 66. So while the behaviour has changed, it is at least not misleading.

bramp avatar Jun 30 '24 02:06 bramp

ok, and after even more debugging, I realise the minus sign is not ascii, but a unicode dash, \u2212... So previously pint would silently skip the dash, now it's more vocal that there is an issue. This is all because Python 3.12 subtly changed the tokeniser.

So I think this is a non-issue. Feel free to close if you agree.

bramp avatar Jun 30 '24 03:06 bramp