docling icon indicating copy to clipboard operation
docling copied to clipboard

Figure export issue when sub-figures are present

Open adisarun30 opened this issue 8 months ago • 1 comments

Bug

When a figure in a research paper contains sub-figures, sometimes the sub-figures get split into multiple PNG files as opposed to a single PNG file. This doesn't always happen but I encountered the issue multiple times. ...

Steps to reproduce

I used the example code from here on the attached PDF. Specifically, Figure 2 in the PDF contains 4 sub-figures and it resulted in 4 PNG files. ...

Docling version

Docling version: 2.30.0 Docling Core version: 2.27.0 Docling IBM Models version: 3.4.1 Docling Parse version: 4.0.1 Python: cpython-310 (3.10.16) Platform: Windows-10-10.0.26100-SP0 ...

Python version

Python 3.10.16

NEJMoa2201445.pdf

...

adisarun30 avatar Apr 22 '25 23:04 adisarun30

@adisarun30 thanks for providing a sample. We are aware of this behavior, and plan to resolve this in a future update with improvements in our layout detection model.

cau-git avatar May 21 '25 13:05 cau-git

I am also facing the same issue. Is there a workaround so that the caption of the figure assembly is duplicated for all the split subfigures?

Currently, when a figure that contains subfigures (with only one caption that explains all the subfigures) is split into separate pictures, only one of the subfigures get the content of the caption in the captions attribute.

fathi0amir avatar Aug 28 '25 02:08 fathi0amir

Making https://github.com/Future-House/paper-qa multimodal so it can complete LAB-Bench's FigQA, this is the biggest issue.

Docling version: 2.58.0
Docling Core version: 2.49.0
Docling IBM Models version: 3.10.1
Docling Parse version: 4.7.0

Variants include:

  • Not propagating ticks or labels from axes
  • Not including title or legend
  • Splitting up subfigures that share an axis
  • Chopping subfigures in half
  • Getting confused by hrules or vrules in subfigures

Some DOIs this happens with: 10.1016/j.neuron.2015.10.001, 10.1016/j.cell.2021.03.032, 10.1038/s41593-019-0566-1, 10.1038/s41586-019-1647-8

jamesbraza avatar Oct 29 '25 16:10 jamesbraza