quantulum3
quantulum3 copied to clipboard
Trying to parse inches using double-quote symbol throws ImportError about stemming
Describe the bug Trying to parse a string containing a double-quote at the end of a number (meaning inches) throws an ImportError about a stemming requirement
To Reproduce
>>> import quantulum3
>>> from quantulum3 import parser as qp
>>> qp.parse('supplied with 3.5" guidewire')
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
qp.parse('supplied with 3.5" guidewire')
File "/home/adam/anaconda3/envs/scne/lib/python3.8/site-packages/quantulum3/parser.py", line 450, in parse
unit, unit_shortening = get_unit(item, text)
File "/home/adam/anaconda3/envs/scne/lib/python3.8/site-packages/quantulum3/parser.py", line 328, in get_unit
base = dis.disambiguate_unit(unit_surface, text, lang)
File "/home/adam/anaconda3/envs/scne/lib/python3.8/site-packages/quantulum3/disambiguate.py", line 18, in disambiguate_unit
base = clf.disambiguate_unit(unit_surface, text, lang).name
File "/home/adam/anaconda3/envs/scne/lib/python3.8/site-packages/quantulum3/classifier.py", line 258, in disambiguate_unit
transformed = classifier(lang).tfidf_model.transform([clean_text(text, lang)])
File "/home/adam/anaconda3/envs/scne/lib/python3.8/site-packages/quantulum3/classifier.py", line 100, in clean_text
return _get_classifier(lang).clean_text(text)
File "/home/adam/anaconda3/envs/scne/lib/python3.8/site-packages/quantulum3/_lang/en_US/classifier.py", line 24, in clean_text
raise ImportError("Module stemming is not installed.")
ImportError: Module stemming is not installed.
>>> qp.parse('supplied with 3.5 inch guidewire')
[Quantity(3.5, "Unit(name="inch", entity=Entity("length"), uri=Inch)")]
Expected behavior
- not throwing the exception
- identifying number 3.5
- ideally, identifying the unit as inch
Additional information:
- Python Version: 3.8.10 in anaconda (tested in idle3 and jupyter notebook)
- Classifier activated/ sklearn installed: [yes/no]
- OS: Ubuntu 21.04
- quantulum3 0.7.9
- sklearn 0.24.2
- scipy 1.7.1
- numpy 1.20.3
Is the issue resolved by installing stemming? Not sure if it is not actually required by the package.
It fixes the exception, thanks! The unit isn't always right, but I see that it's using context.
>>> import stemming
>>> import quantulum3
>>> from quantulum3 import parser as qp
>>> qp.parse('supplied with 3.5" guidewire')
[Quantity(3.5, "Unit(name="second of arc", entity=Entity("angle"), uri=Minute_and_second_of_arc)")]
>>> qp.parse('supplied with 3.5" long guidewire')
[Quantity(3.5, "Unit(name="second of arc", entity=Entity("angle"), uri=Minute_and_second_of_arc)")]
>>> qp.parse('supplied with 3.5" wide guidewire')
[Quantity(3.5, "Unit(name="inch", entity=Entity("length"), uri=Inch)")]
Closing this now :)