grobid-ner icon indicating copy to clipboard operation
grobid-ner copied to clipboard

Update parser using the new format and annotated data

Open lfoppiano opened this issue 7 years ago • 1 comments

We process the text to identify NER information using the newly annotated corpus:

  1. before we need to first identify the following items:
  • reference markers
  • references
  • equations (?)
  1. We process the text with the NER parser

lfoppiano avatar Jul 26 '17 12:07 lfoppiano

I think an important design in general for text mining modularity, is that it's up to the user/client of the API to apply a desired sequence of process, not to the module to accumulate process/pre-processing. So in other term, we build a NER module for extracting NE in standard/plain text, then it's up to the user of the NER to ensure than the text matches the input specification.

So 1 is not relevant to NER - although it could be added from the grobid-core and other grobid modules for a demo. Otherwise, there would always be a ton of preprocessing, document types, custom entities, formats, etc. that we can't handle in a generic manner.

kermitt2 avatar Jul 26 '17 12:07 kermitt2