grobid-quantities
grobid-quantities copied to clipboard
Value expressed with alphabetic characters
For example "twenty kilos" - currently the recognition is very bad and there is no normalization into numerical values.
We should:
- add a matching feature for this very limited vocabulary in the quantity model (it will generalize the examples in the training data),
- add a dedicated normalization.
Basic words to number normalization - only English - with commit 18379a25a49b17bf61e4d9bf0cdcd3504ad90cb3
Add a number word matching feature in the quantity model with commit 5e0b1b72be33e0a25bdff769ec1d5f1f182e6cdc -> the detection works much better now
FYI http://stackoverflow.com/questions/3911966/how-to-convert-number-to-words-in-java
We're doing the opposite from this stackoverflow, we convert words to numbers ;)
But anyway, it's almost finished, just need to cover a couple of unregular expressions (like "oh", "dozen", "half",...) as CRF feature and normalization.