Christoph Auer
Christoph Auer
This PR will be superseded by another one, since it introduced unnecessary code duplication for this purpose.
Another sample of a word document not detected as such is seen in issue https://github.com/DS4SD/docling/issues/476.
@copilot this fix is nonsense, therefore closing.
This appears to be addressed. Closing.
@maliktalha370 Can you please elaborate on what your expectations would be? The text information we output in JSON or Markdown is both the programmatic text embedded in PDFs, and the...
@ALIYoussef We would be excited to see alternative layout or table structure models implementations from the community. The example above posted by @dolfim-ibm is a good way to understand the...
@yannistml I can confirm docling starts to hang on our standard test PDF in `tests/data/pdf/2206.01062.pdf` and produces garbage output in the end. The problem appears to be rooted in the...
@ShiroYasha18 Thanks for the updates! To get it finalized, can you please get the CI checks green? - Run the pre-commit toolchain: `poetry run pre-commit run --all-files -v` - Ensure...
@ShiroYasha18 I updated the tests now. Let's see if CI passes, then it should be ok.
@Raphilanthrope I can not find any logical difference between the original code and your proposal. Do you have a practical case where this change makes a difference? If yes, please...