datashare
datashare copied to clipboard
fix: handle PDF soft line breaks
Currently Tika forgets about soft line breaks detected inside PDF, these line break shouldn't be translated into hard line breaks inside Datashare.
Waiting for the next release from Tika.
This issue is stale because it has been open for 40 days with no activity.
Might be worth asking status update to the Tika's team soon cc @pirhoo
Sent a mail to the Tika mailing list to get an update on the issue's status (see [email protected] inbox)
Update on June 10th: Tim/Tilman said they would update the PDFParserConfig with parameters to allow to keep soft line breaks. Waiting for the feature to be implemented