Clément Doumouro
Results
13
comments of
Clément Doumouro
Sent a mail to the Tika mailing list to get an update on the issue's status (see [email protected] inbox)
Update on June 10th: Tim/Tilman said they would update the PDFParserConfig with parameters to allow to keep soft line breaks. Waiting for the feature to be implemented
Wait for Spacy NER to be implemented to allow for faster prototyping / easier text processing: https://github.com/ICIJ/datashare/issues/1452