Christoph Auer

Results 170 comments of Christoph Auer

@mkrssg I transferred your branch [here](https://github.com/docling-project/docling/compare/dev/fix_msword_backend_identify_text_after_image) to be able to make edits from my side, since I wasn't sure what is causing the repeated test issues. Now the test files...

Hi @vishaldasnewtide, to allow proper reproduction please provide the original input file and conversion settings.

This should be addressed as part of training updates for the next layout model (see https://github.com/docling-project/docling-ibm-models/pull/92)

@harinisri2001 can you please outline how you run `smoldocling`? through docling, or through code from the [smoldocling model card](https://huggingface.co/ds4sd/SmolDocling-256M-preview)? Also, can you share the affected PDF?

@Yash8745 I can not verify this right now but I know that the `generate_multimodal_pages` utility is terribly outdated, working on a legacy representation of the docling output. It clearly needs...

@gadgetlabs I can reproduce this error with the default docling settings. But I can successfully convert it by switching to `docling-parse-v2` backend, see: ``` docling --pdf-backend=dlparse_v2 the-adventures-of-sherlock-holmes-004-adventure-4-the-boscombe-valley-mystery.pdf ```

@pierre-sigwalt This log output has no practical meaning and we will remove it in the near future.

@jackdorney1999 Hi, can you please attach an example document and the minimal code to reproduce your issue? Thanks.

@bancroftway I just checked the output with our default pipeline. Here is what I see: - "County of Steuben Town of Woodhull" is identified as a page header. It ends...

@nickrallison I re-tested both PDFs you provide above with docling==2.17.0, and I get output in both cases. I will therefore close this issue as resolved. If you find more evidence...