Christoph Auer

Results 170 comments of Christoph Auer

Note: Current version of docling (2.17.0) has the headers ordering sorted, but we are continuing to work on proper key-value placemement. Output:

Checking the attached PDF, it is not a surprise we see very long conversion time. It is fully scanned and has a lot of pages, which is very slow on...

Hi @Manuel030, yes, this is indeed a problem with MS Office formats we are aware of. Let us have an iteration on this topic to see if we can find...

@pankpy Could you please provide an example to illustrate the behaviour? Thanks.

@aodingpeng I will investigate this issue. My suspicion is that the layout of this page is wrongly detected as a full page picture, hence all content in the detected picture...

@aodingpeng The current version of docling (2.17.0) treats your sample case better now. Certainly not perfect but you will get some meaningful text out. I will close this issue until...

@jerbob92 @BelaidCH we have a version in the works that will enable to get the in-picture content out, it will be released by end of next week. I will post...

This has since been implemented and is ready to use.

@Raphilanthrope This is obsolete since docling 2.13.0 because the layout_utils code is entirely replaced.

@Manuel030 @maxmnemonic There is apparently a newer PR with the same goal here: https://github.com/docling-project/docling/pull/1610 which has the proper condition to not produce empty text paragraphs.