datashare
datashare copied to clipboard
feature: batch ner
TODO
- [ ] merge #1538
PR description
Implemenent batch text processing for NER, this change is made in the context of #1452, as batch processing is necessary for Spacy.
Changes
datashare-api
⚠️
Added
- added the batch text processing API
List<List<NlpTag>> processText(Stream<String> batch, Language language) throws InterruptedException
toPipeline
Changed
- made
NlpTag
a record and json serializable class
datashare-core-nlp
Added
- implemented batch processing for stanford core nlp)
datashare-app
Added
- added
bool Pipeline.Type.extractFromDoc()
which indicates if the pipeline should preferrably used on full documents or can be used on text chunks - implemented batch text processing inside the
ExtractNlpTask
for pipelines which do not require prediction on documents