quantulum3
quantulum3 copied to clipboard
connects abbreviations - interpret unusual words as unit
Describe the bug connects abbreviations together, what doesn't make sense
from quantulum3 import parser
>>> parser.parse('1 pplga')
[Quantity(1, "Unit(name="pint pint litre gigayear", entity=Entity("unknown"), uri=None)")]
Expected behavior
>>> parser.parse('1 pplga')
[Quantity(1, "Unit(name="dimensionless", entity=Entity("dimensionless"), uri=Dimensionless_quantity)")]
Thanks for your issue. The beviour you describe is expected. The tool does interpret everything as a unit that is not a common English word. Do you have a proposal to improve this behavior? Maybe one could disregard all units where two times the same unit appears. But sometimes this is wanted as in i.e. km² which could be written as km*km
Thank you for your response. I think that the case when the same unit appears more than one time should be considered only if this unit may be multidimensional (like in your example: length - square). Otherwise it may be disregarded.
Interpreting different abbreviations written together as a compound measure may leads to the mistake.
>>> parser.parse('a gin')
[Quantity(1, "Unit(name="gram inch", entity=Entity("unknown"), uri=None)")]
only if this unit may be multidimensional
On what basis would this than be decided. I can only imagine storing for every value whether there are multidimensional cases or not, which sounds to me like huge overhead, prone for errors.
Interpreting different abbreviations written together as a compound measure may leads to the mistake.
Currently, the most common 10.000 words of the English language are disregarded as "could be a unit". If you find additional words that are common (in the best case a whole list of them) or have a better idea for filtering, I'd be glad to integrate them.
Actually this in in some form a duplicate of #35