Add option to utilize LLMs to analyze and describe images within documents
Please add an option to utilize LLMs to analyze and describe images within documents such as PDFs, DOCX, PPTX, and others. These descriptions should then be automatically incorporated into the generated .md file.
+1 I think this will be helpful
++ very important
I've created PR #306, which does the job for PPTX files. For PDF and DOCX it is a bit more complicated, since the conversion to MD is completely handled within other libraries.
Could we consider directly taking screenshots of each PDF page and then treating them as images for the LLM to process?