quantulum3 icon indicating copy to clipboard operation
quantulum3 copied to clipboard

Trying to parse inches using double-quote symbol throws ImportError about stemming

Open adam-funk opened this issue 3 years ago • 2 comments

Describe the bug Trying to parse a string containing a double-quote at the end of a number (meaning inches) throws an ImportError about a stemming requirement

To Reproduce

>>> import quantulum3
>>> from quantulum3 import parser as qp
>>> qp.parse('supplied with 3.5" guidewire')
Traceback (most recent call last):
  File "<pyshell#3>", line 1, in <module>
    qp.parse('supplied with 3.5" guidewire')
  File "/home/adam/anaconda3/envs/scne/lib/python3.8/site-packages/quantulum3/parser.py", line 450, in parse
    unit, unit_shortening = get_unit(item, text)
  File "/home/adam/anaconda3/envs/scne/lib/python3.8/site-packages/quantulum3/parser.py", line 328, in get_unit
    base = dis.disambiguate_unit(unit_surface, text, lang)
  File "/home/adam/anaconda3/envs/scne/lib/python3.8/site-packages/quantulum3/disambiguate.py", line 18, in disambiguate_unit
    base = clf.disambiguate_unit(unit_surface, text, lang).name
  File "/home/adam/anaconda3/envs/scne/lib/python3.8/site-packages/quantulum3/classifier.py", line 258, in disambiguate_unit
    transformed = classifier(lang).tfidf_model.transform([clean_text(text, lang)])
  File "/home/adam/anaconda3/envs/scne/lib/python3.8/site-packages/quantulum3/classifier.py", line 100, in clean_text
    return _get_classifier(lang).clean_text(text)
  File "/home/adam/anaconda3/envs/scne/lib/python3.8/site-packages/quantulum3/_lang/en_US/classifier.py", line 24, in clean_text
    raise ImportError("Module stemming is not installed.")
ImportError: Module stemming is not installed.
>>> qp.parse('supplied with 3.5 inch guidewire')
[Quantity(3.5, "Unit(name="inch", entity=Entity("length"), uri=Inch)")]

Expected behavior

  1. not throwing the exception
  2. identifying number 3.5
  3. ideally, identifying the unit as inch

Additional information:

  • Python Version: 3.8.10 in anaconda (tested in idle3 and jupyter notebook)
  • Classifier activated/ sklearn installed: [yes/no]
  • OS: Ubuntu 21.04
  • quantulum3 0.7.9
  • sklearn 0.24.2
  • scipy 1.7.1
  • numpy 1.20.3

adam-funk avatar Oct 21 '21 15:10 adam-funk

Is the issue resolved by installing stemming? Not sure if it is not actually required by the package.

nielstron avatar Oct 21 '21 18:10 nielstron

It fixes the exception, thanks! The unit isn't always right, but I see that it's using context.

>>> import stemming
>>> import quantulum3
>>> from quantulum3 import parser as qp
>>> qp.parse('supplied with 3.5" guidewire')
[Quantity(3.5, "Unit(name="second of arc", entity=Entity("angle"), uri=Minute_and_second_of_arc)")]
>>> qp.parse('supplied with 3.5" long guidewire')
[Quantity(3.5, "Unit(name="second of arc", entity=Entity("angle"), uri=Minute_and_second_of_arc)")]
>>> qp.parse('supplied with 3.5" wide guidewire')
[Quantity(3.5, "Unit(name="inch", entity=Entity("length"), uri=Inch)")]

adam-funk avatar Oct 22 '21 08:10 adam-funk

Closing this now :)

nielstron avatar Dec 31 '22 18:12 nielstron