Christoph Auer

Results 170 comments of Christoph Auer

This PR will be superseded by another one, since it introduced unnecessary code duplication for this purpose.

Another sample of a word document not detected as such is seen in issue https://github.com/DS4SD/docling/issues/476.

@copilot this fix is nonsense, therefore closing.

@maliktalha370 Can you please elaborate on what your expectations would be? The text information we output in JSON or Markdown is both the programmatic text embedded in PDFs, and the...

@ALIYoussef We would be excited to see alternative layout or table structure models implementations from the community. The example above posted by @dolfim-ibm is a good way to understand the...

@yannistml I can confirm docling starts to hang on our standard test PDF in `tests/data/pdf/2206.01062.pdf` and produces garbage output in the end. The problem appears to be rooted in the...

@ShiroYasha18 Thanks for the updates! To get it finalized, can you please get the CI checks green? - Run the pre-commit toolchain: `poetry run pre-commit run --all-files -v` - Ensure...

@ShiroYasha18 I updated the tests now. Let's see if CI passes, then it should be ok.

@Raphilanthrope I can not find any logical difference between the original code and your proposal. Do you have a practical case where this change makes a difference? If yes, please...