datashare icon indicating copy to clipboard operation
datashare copied to clipboard

fix: handle PDF soft line breaks

Open ClemDoum opened this issue 1 year ago • 5 comments

Currently Tika forgets about soft line breaks detected inside PDF, these line break shouldn't be translated into hard line breaks inside Datashare.

ClemDoum avatar Feb 13 '24 10:02 ClemDoum

Waiting for the next release from Tika.

pirhoo avatar Mar 12 '24 10:03 pirhoo

This issue is stale because it has been open for 40 days with no activity.

github-actions[bot] avatar Apr 22 '24 00:04 github-actions[bot]

Might be worth asking status update to the Tika's team soon cc @pirhoo

ClemDoum avatar May 14 '24 09:05 ClemDoum

Sent a mail to the Tika mailing list to get an update on the issue's status (see [email protected] inbox)

ClemDoum avatar Jun 10 '24 09:06 ClemDoum

Update on June 10th: Tim/Tilman said they would update the PDFParserConfig with parameters to allow to keep soft line breaks. Waiting for the feature to be implemented

ClemDoum avatar Jul 09 '24 08:07 ClemDoum