pint
pint copied to clipboard
pint throws unexpected AttributeError on input data `⅓`
Hello! It's been a long time!! :-)
This little issue isn't blocking anything but I thought you might want to know.
We are running pint over a large number of tokens in a corpus to find which ones we can recognize as units.
We catch pint.PintError
and use that to identify non-units.
However, in one case, pint throws an AttributeError
!
>>> unit.Quantity('1/3')
<Quantity(0.333333333, 'dimensionless')>
>>> unit.Quantity('xxx')
pint.errors.UndefinedUnitError: 'xxx' is not defined in the unit registry
So far so good, but:
>>> unit.Quantity('⅓')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/tom/.virtualenvs/engora/lib/python3.8/site-packages/pint/quantity.py", line 266, in __new__
inst = ureg.parse_expression(value)
File "/Users/tom/.virtualenvs/engora/lib/python3.8/site-packages/pint/registry.py", line 1340, in parse_expression
return build_eval_tree(gen).evaluate(
AttributeError: 'NoneType' object has no attribute 'evaluate'
The issue might be because:
>>> '⅓'.isnumeric(), '⅓'.isdecimal(), '1/3'.isnumeric()
(True, False, False)
which I only discovered a few minutes ago.
But I can't find an example of isnumeric
in your code, but you might be using regexs that use isnumeric
or perhaps it's something else entirely.
For entertainment, in a Python terminal session you can throw away, type:
>>> print(''.join(chr(i) for i in range(0x1000) if chr(i).isnumeric))
Bit tricky to track it down, but it turns out it was tokenize.tokenize
not recognizing the character as a number type. It's already pretty hacky but fixing mixed numbers would probably be a mess, so I think those should be pre-processed (i.e. turn '4⅑'
into '(4+⅑)'
).
Use preprocessors for this. Feel free to reopen.