grobid
grobid copied to clipboard
Date model improvements
Some improvements to be added to the grobid-core date model:
- [x] (design/minor) move the date labels in TaggingLabels.java
~~- [ ] Add optional time information in the parsing phase (e.g.
<hour>
:<minutes>
:<seconds>
.<milliseconds>
TZD
) https://www.w3.org/TR/NOTE-datetime~~ - [x] Check whether the normalisation phase could be replaced using https://github.com/HeidelTime/heideltime/issues
Here a collection of sample that could be improved:
19 January 19 83
is not correctly normalised, though it's correctly extracted:
CRF output:
19 19 1 19 19 19 9 19 19 19 LINESTART NOCAPS ALLDIGIT 0 0 0 NOPUNCT <date> I-<day>
. . . . . . . . . . LINEIN ALLCAP NODIGIT 1 0 0 DOT <date> I-<other>
January january J Ja Jan Janu y ry ary uary LINEIN INITCAP NODIGIT 0 0 1 NOPUNCT <date> I-<month>
19 19 1 19 19 19 9 19 19 19 LINEIN NOCAPS ALLDIGIT 0 0 0 NOPUNCT <date> I-<year>
83 83 8 83 83 83 3 83 83 83 LINEEND NOCAPS ALLDIGIT 0 0 0 NOPUNCT <date> <year>