pint icon indicating copy to clipboard operation
pint copied to clipboard

pint throws unexpected AttributeError on input data `⅓`

Open rec opened this issue 2 years ago • 1 comments

Hello! It's been a long time!! :-)

This little issue isn't blocking anything but I thought you might want to know.

We are running pint over a large number of tokens in a corpus to find which ones we can recognize as units.

We catch pint.PintError and use that to identify non-units.

However, in one case, pint throws an AttributeError!

>>> unit.Quantity('1/3')
<Quantity(0.333333333, 'dimensionless')>

>>> unit.Quantity('xxx')

  pint.errors.UndefinedUnitError: 'xxx' is not defined in the unit registry

So far so good, but:

>>> unit.Quantity('⅓')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/tom/.virtualenvs/engora/lib/python3.8/site-packages/pint/quantity.py", line 266, in __new__
    inst = ureg.parse_expression(value)
  File "/Users/tom/.virtualenvs/engora/lib/python3.8/site-packages/pint/registry.py", line 1340, in parse_expression
    return build_eval_tree(gen).evaluate(
AttributeError: 'NoneType' object has no attribute 'evaluate'

The issue might be because:

>>> '⅓'.isnumeric(), '⅓'.isdecimal(), '1/3'.isnumeric()
(True, False, False)

which I only discovered a few minutes ago.

But I can't find an example of isnumeric in your code, but you might be using regexs that use isnumeric or perhaps it's something else entirely.

For entertainment, in a Python terminal session you can throw away, type:

>>> print(''.join(chr(i) for i in range(0x1000) if chr(i).isnumeric))

rec avatar Dec 03 '21 10:12 rec

Bit tricky to track it down, but it turns out it was tokenize.tokenize not recognizing the character as a number type. It's already pretty hacky but fixing mixed numbers would probably be a mess, so I think those should be pre-processed (i.e. turn '4⅑' into '(4+⅑)').

OrangeChannel avatar Dec 15 '21 15:12 OrangeChannel

Use preprocessors for this. Feel free to reopen.

hgrecco avatar Apr 27 '23 03:04 hgrecco