docling icon indicating copy to clipboard operation
docling copied to clipboard

Standalone equations are partially confused with regular text items

Open cau-git opened this issue 11 months ago • 1 comments

Bug

The layout detector wrongly classifies some obvious equations as text items. The post-processing needs to be improved to resolve these cases better when there are competing proposals on the same element with different labels and confidences.

Steps to reproduce

Convert the provided example PDF and observe missed formulas. code_and_formulas_2.pdf

Docling version


Docling version: 2.15.1
Docling Core version: 2.14.0
Docling IBM Models version: 3.1.2
Docling Parse version: 3.0.0

Python version

Any

cau-git avatar Jan 13 '25 09:01 cau-git

This should be addressed as part of training updates for the next layout model (see https://github.com/docling-project/docling-ibm-models/pull/92)

cau-git avatar May 21 '25 12:05 cau-git