grobid icon indicating copy to clipboard operation
grobid copied to clipboard

Date model improvements

Open lfoppiano opened this issue 7 years ago • 1 comments

Some improvements to be added to the grobid-core date model:

  • [x] (design/minor) move the date labels in TaggingLabels.java ~~- [ ] Add optional time information in the parsing phase (e.g. <hour>:<minutes>:<seconds>.<milliseconds> TZD) https://www.w3.org/TR/NOTE-datetime~~
  • [x] Check whether the normalisation phase could be replaced using https://github.com/HeidelTime/heideltime/issues

lfoppiano avatar Jan 02 '17 18:01 lfoppiano

Here a collection of sample that could be improved:

19 January 19 83 is not correctly normalised, though it's correctly extracted:

CRF output:

19	19	1	19	19	19	9	19	19	19	LINESTART	NOCAPS	ALLDIGIT	0	0	0	NOPUNCT	<date>	I-<day>
.	.	.	.	.	.	.	.	.	.	LINEIN	ALLCAP	NODIGIT	1	0	0	DOT	<date>	I-<other>
January	january	J	Ja	Jan	Janu	y	ry	ary	uary	LINEIN	INITCAP	NODIGIT	0	0	1	NOPUNCT	<date>	I-<month>
19	19	1	19	19	19	9	19	19	19	LINEIN	NOCAPS	ALLDIGIT	0	0	0	NOPUNCT	<date>	I-<year>
83	83	8	83	83	83	3	83	83	83	LINEEND	NOCAPS	ALLDIGIT	0	0	0	NOPUNCT	<date>	<year>

lfoppiano avatar Jan 03 '17 15:01 lfoppiano