quantulum3 icon indicating copy to clipboard operation
quantulum3 copied to clipboard

connects abbreviations - interpret unusual words as unit

Open liarig opened this issue 6 years ago • 4 comments
trafficstars

Describe the bug connects abbreviations together, what doesn't make sense

from quantulum3 import parser
>>> parser.parse('1 pplga')
[Quantity(1, "Unit(name="pint pint litre gigayear", entity=Entity("unknown"), uri=None)")]

Expected behavior

>>> parser.parse('1 pplga')
[Quantity(1, "Unit(name="dimensionless", entity=Entity("dimensionless"), uri=Dimensionless_quantity)")]

liarig avatar Jun 04 '19 22:06 liarig

Thanks for your issue. The beviour you describe is expected. The tool does interpret everything as a unit that is not a common English word. Do you have a proposal to improve this behavior? Maybe one could disregard all units where two times the same unit appears. But sometimes this is wanted as in i.e. km² which could be written as km*km

nielstron avatar Jun 05 '19 06:06 nielstron

Thank you for your response. I think that the case when the same unit appears more than one time should be considered only if this unit may be multidimensional (like in your example: length - square). Otherwise it may be disregarded.

Interpreting different abbreviations written together as a compound measure may leads to the mistake.

>>> parser.parse('a gin')
[Quantity(1, "Unit(name="gram inch", entity=Entity("unknown"), uri=None)")]

liarig avatar Jun 05 '19 09:06 liarig

only if this unit may be multidimensional

On what basis would this than be decided. I can only imagine storing for every value whether there are multidimensional cases or not, which sounds to me like huge overhead, prone for errors.

Interpreting different abbreviations written together as a compound measure may leads to the mistake.

Currently, the most common 10.000 words of the English language are disregarded as "could be a unit". If you find additional words that are common (in the best case a whole list of them) or have a better idea for filtering, I'd be glad to integrate them.

nielstron avatar Jun 06 '19 06:06 nielstron

Actually this in in some form a duplicate of #35

nielstron avatar Jul 23 '19 20:07 nielstron