grobid-quantities
grobid-quantities copied to clipboard
Unit parsing: full names unit, full names with inflections
As we moved from "lexical mapping" (not to say rules!) to a CRF parser to process and normalize the unit expressions, the full name unit are not covered by the unit parser, e.g. hours in "2 hours".
[WARN ] org.grobid.core.engines.QuantityParser: Could not normalize the value: 2.
org.grobid.core.data.normalization.NormalizationException: The unit Unit{rawName='hours', offsets=661 666, productBlock=null} cannot be normalized. It is either not a valid unit or it is not recognized from the available parsers.
Should work now. The only glitch is that the value is normalized in seconds, which is not optimal for huge units, like year/years.
See example:
Now also strange prefix-inflection combination are working in english.

TODO: we need to migrate the lexicon to a json base input file and add support for french and german. (see #14 )
Lexicon has been migrated. We need to add support for french and german (to be check if the notation are not changing between languages)