How get image captioning in docx files?
Hey, I tried to convert docx with images file to md, but It does not do captioning:
from markitdown import MarkItDown
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy") # my local VLLM host
md = MarkItDown(llm_client=client, llm_model="microsoft/Phi-3.5-vision-instruct")
result = md.convert("file.docx")
print(result.text_content)
# ....  ....
What did I do wrong?
Thank you in advance for your reply!
Same issue here. A bug maybe, for .pptx or .jpg it works well.
https://github.com/microsoft/markitdown/pull/1140 It has supported passing parameters keep_data_uri to preserve image information
@afourney
I think to be consistent with pptx etc., the request is to have the images get automatically captioned (either with the alt-text from the Word doc itself, or LLM-generated).
This is indeed a discrepancy, and I will work to address it in a future PR.
would recommend also allow to have <--- image -1 ---> with base64 code or URL to path , image place holders like Docling in converted markdown, so that at least we can apply vision LLM to get caption by reading information from these image place holders. Thanks
What’s the status on this? I’m also not able get image captioning using LLM for docx files.
Any updates so far? Automated image captioning on .docx, .pdf, would certainly be useful.