grobid-ner
grobid-ner copied to clipboard
Update parser using the new format and annotated data
We process the text to identify NER information using the newly annotated corpus:
- before we need to first identify the following items:
- reference markers
- references
- equations (?)
- We process the text with the NER parser
I think an important design in general for text mining modularity, is that it's up to the user/client of the API to apply a desired sequence of process, not to the module to accumulate process/pre-processing. So in other term, we build a NER module for extracting NE in standard/plain text, then it's up to the user of the NER to ensure than the text matches the input specification.
So 1 is not relevant to NER - although it could be added from the grobid-core and other grobid modules for a demo. Otherwise, there would always be a ton of preprocessing, document types, custom entities, formats, etc. that we can't handle in a generic manner.