[Question] Figures and captions

Open austinmw opened this issue 1 year ago • 2 comments

Question

Hi, is this library also able to use multi-modal LLMs to interpret charts and figures within PDF documents?

Dec 03 '24 15:12 austinmw

Hello @austinmw,

Thank you for your question. Currently, Docling does not offer this feature. However, our team is actively working on introducing an image classifier model first, and then a multi-modal LLM that can convert charts into structured formats like JSON, CSV, and Markdown. Stay tuned!

Dec 09 '24 09:12 Matteo-Omenetti

@Matteo-Omenetti keep us updated on this!

Dec 09 '24 17:12 simjak

I'm parsing some documents after months of this discussion. For some reason, my docling Document identify many captions and label them as captions, but it doesnt associate with the images that are right above it. Anyone having this issue?

Mar 19 '25 14:03 rafaelghiorzi