Christoph Auer

Results 170 comments of Christoph Auer

@SebastianCallh This could be related to the findings summarized in https://github.com/DS4SD/docling/issues/802. We have it on the radar.

@choigawoon Docling supports Excel worksheets. Did you test if exporting the google spreadsheet to excel format, and then running that through docling, works?

We can use this as an example for scanned PDF: https://github.com/ocrmypdf/OCRmyPDF/issues/1157#issuecomment-1762851062

@Bariskau what is the input format you were using? Is this a native Powerpoint, a PDF, or something else? If you provide the source file we could verify more easily.

@Bariskau @mkhalid12 a revised reading order model is currently under development. We will post updates when we have them ready.

@Fogapod Closing this issue, as it appears there is no further follow-up required. Please re-open if you have further input.

@mllife this is a matter of how the PDF encoded the text, you'll be getting out whatever the PDF has encoded in it. So, this is not a matter of...

@dhdaines Thanks for this contribution. We would prefer not to put approximate information on pagination in the output document, but rather go with unpaginated document if there is no reliable...

@DeezNutz6942O I think for the purpose it would be sufficient to _include_ radio items in the checkboxes detection. They may be detected as checkboxes already now, but it might be...

This is a classic edge case. We hope consistency will improve with refinements to the layout detection model, which is currently work-in-progress.