markitdown icon indicating copy to clipboard operation
markitdown copied to clipboard

Add option to utilize LLMs to analyze and describe images within documents

Open hallkass opened this issue 11 months ago • 4 comments

Please add an option to utilize LLMs to analyze and describe images within documents such as PDFs, DOCX, PPTX, and others. These descriptions should then be automatically incorporated into the generated .md file.

hallkass avatar Jan 03 '25 15:01 hallkass

+1 I think this will be helpful

HeMuling avatar Jan 05 '25 17:01 HeMuling

++ very important

jeremedia avatar Jan 09 '25 22:01 jeremedia

I've created PR #306, which does the job for PPTX files. For PDF and DOCX it is a bit more complicated, since the conversion to MD is completely handled within other libraries.

masquare avatar Jan 29 '25 08:01 masquare

Could we consider directly taking screenshots of each PDF page and then treating them as images for the LLM to process?

Rainerhu avatar Apr 07 '25 06:04 Rainerhu