datashare icon indicating copy to clipboard operation
datashare copied to clipboard

feature: batch ner

Open ClemDoum opened this issue 5 months ago • 0 comments

TODO

  • [ ] merge #1538

PR description

Implemenent batch text processing for NER, this change is made in the context of #1452, as batch processing is necessary for Spacy.

Changes

datashare-api ⚠️

Added

  • added the batch text processing API List<List<NlpTag>> processText(Stream<String> batch, Language language) throws InterruptedException to Pipeline

Changed

  • made NlpTag a record and json serializable class

datashare-core-nlp

Added

  • implemented batch processing for stanford core nlp)

datashare-app

Added

  • added bool Pipeline.Type.extractFromDoc() which indicates if the pipeline should preferrably used on full documents or can be used on text chunks
  • implemented batch text processing inside the ExtractNlpTask for pipelines which do not require prediction on documents

ClemDoum avatar Sep 05 '24 08:09 ClemDoum