Christoph Auer

Results 170 comments of Christoph Auer

@Raphilanthrope I am closing this because it will be obsolete with an update I am making to the layout postprocessing.

@CVxTz Yes, we have this on our roadmap. There is some work-in-progress outlined [here](https://github.com/DS4SD/docling/discussions/227#discussioncomment-11176040) from @pavel-denisov-fraunhofer on the layout-model to run it in ONNX, we will pick this up after...

@amal5haji @pchalasani We will keep tracking this request as part of issue https://github.com/DS4SD/docling/issues/309. I will close this issue to avoid duplication.

It appears that this issue is addressed in multiple places and can be closed. 1. Using `pipeline_options.ocr_options.force_full_page_ocr = True` (or `--force-ocr` on the CLI) in case you have a PDF...

@simonschoe You will need `force_full_page_ocr` if you want to ensure only text cells from the OCR engine are processed. That is the case for example if your PDF does not...

@sindre-sonat Can you provide a PDF which exposes the problem? Thanks.

@mkrssg Many thanks for creating a fix. Could you please provide a sample word doc and a before/after? Ideally, a test case with a sample doc would cover this change....

@mkrssg Many thanks for the test case. To ensure the CI tests will pass, there's one more step to do. Since this fix changes the test ground truth for docx...

@mkrssg It appears that a recent merge to `main` has created updated test files overlapping with yours. To rebase this, please update from `main` and resolve conflicts by accepting only...

@mkrssg It took a bit more than expected to get tests working again, there was a global issue affecting all PRs. We have since merged [this](https://github.com/docling-project/docling/pull/1698) PR. Could you rebase...