Add option to utilize LLMs to analyze and describe images within documents

Open hallkass opened this issue 11 months ago • 4 comments

Please add an option to utilize LLMs to analyze and describe images within documents such as PDFs, DOCX, PPTX, and others. These descriptions should then be automatically incorporated into the generated .md file.

Jan 03 '25 15:01 hallkass

+1 I think this will be helpful

Jan 05 '25 17:01 HeMuling

++ very important

Jan 09 '25 22:01 jeremedia

I've created PR #306, which does the job for PPTX files. For PDF and DOCX it is a bit more complicated, since the conversion to MD is completely handled within other libraries.

Jan 29 '25 08:01 masquare

Could we consider directly taking screenshots of each PDF page and then treating them as images for the LLM to process?

Apr 07 '25 06:04 Rainerhu