Bi-column layout error with visual separator
Bug
Docling is unable to recognize a separator line. See below
Steps to reproduce
Start docling-serve:
docker run --rm -e DOCLING_SERVE_ENABLE_UI=true -p 5001:5001 quay.io/docling-project/docling-serve:v0.6.0
Then browse to the ui. Choose url convert and paste the following url: https://www.amundi.lu/professional/dl/doc/prospectus/LU1807499428/ENG/LUX
Choose the following options:
Now go to page 5:
Expected: docling takes into account the separator line in the middle of the page, so that the paragraph Management Process will continue on the second column.
Management Process
The Sub-Funds... for the Sub-Fund.
Therefore, for the purpose... Risk Management method Commitment
----
Planning your investment...
Current: Docling process the first column, then the second column without taking into account the middle separator line.
Management Process
The Sub-Funds... for the Sub-Fund.
----
Planning your investment...
Therefore, for the purpose... Risk Management method Commitment
Docling version
2.25.1
Python version
3.12.9
@Nowheresly Thanks for providing a sample for this edge case. We are actively working on this topic, stay tuned for future updates.
Thank you @cau-git for your nice message. Be sure we will test the fix. docling does a really good job, but financial documents are still very complex to parse with a lot of subtle corner cases.