grobid-ner Update parser using the new format and annotated data

Update parser using the new format and annotated data

Open lfoppiano opened this issue 7 years ago • 1 comments

We process the text to identify NER information using the newly annotated corpus:

before we need to first identify the following items:

reference markers
references
equations (?)

We process the text with the NER parser

Jul 26 '17 12:07 lfoppiano

I think an important design in general for text mining modularity, is that it's up to the user/client of the API to apply a desired sequence of process, not to the module to accumulate process/pre-processing. So in other term, we build a NER module for extracting NE in standard/plain text, then it's up to the user of the NER to ensure than the text matches the input specification.

So 1 is not relevant to NER - although it could be added from the grobid-core and other grobid modules for a demo. Otherwise, there would always be a ton of preprocessing, document types, custom entities, formats, etc. that we can't handle in a generic manner.

Jul 26 '17 12:07 kermitt2

grobid-ner grobid-ner copied to clipboard

Update parser using the new format and annotated data

grobid-ner
grobid-ner copied to clipboard