docling icon indicating copy to clipboard operation
docling copied to clipboard

Bi-column layout error with visual separator

Open Nowheresly opened this issue 9 months ago • 2 comments

Bug

Docling is unable to recognize a separator line. See below

Steps to reproduce

Start docling-serve:

docker run --rm -e DOCLING_SERVE_ENABLE_UI=true -p 5001:5001 quay.io/docling-project/docling-serve:v0.6.0

Then browse to the ui. Choose url convert and paste the following url: https://www.amundi.lu/professional/dl/doc/prospectus/LU1807499428/ENG/LUX

Choose the following options:

Image

Now go to page 5:

Image

Expected: docling takes into account the separator line in the middle of the page, so that the paragraph Management Process will continue on the second column.

Management Process
The Sub-Funds... for the Sub-Fund.
Therefore, for the purpose... Risk Management method Commitment

----
Planning your investment...

Current: Docling process the first column, then the second column without taking into account the middle separator line.

Image

Management Process
The Sub-Funds... for the Sub-Fund.
----
Planning your investment...

Therefore, for the purpose... Risk Management method Commitment

Docling version

2.25.1

Python version

3.12.9

Nowheresly avatar Mar 18 '25 16:03 Nowheresly

@Nowheresly Thanks for providing a sample for this edge case. We are actively working on this topic, stay tuned for future updates.

cau-git avatar May 21 '25 13:05 cau-git

Thank you @cau-git for your nice message. Be sure we will test the fix. docling does a really good job, but financial documents are still very complex to parse with a lot of subtle corner cases.

Nowheresly avatar May 25 '25 16:05 Nowheresly