markitdown How get image captioning in docx files?

Hey, I tried to convert docx with images file to md, but It does not do captioning:

from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")  # my local VLLM host
md = MarkItDown(llm_client=client, llm_model="microsoft/Phi-3.5-vision-instruct")

result = md.convert("file.docx")
print(result.text_content)
# .... ![](data:image/png;base64...) ....

What did I do wrong?

Thank you in advance for your reply!

Mar 07 '25 09:03 DmitryDiTy

Same issue here. A bug maybe, for .pptx or .jpg it works well.

Mar 16 '25 06:03 tookdes

https://github.com/microsoft/markitdown/pull/1140 It has supported passing parameters keep_data_uri to preserve image information

@afourney

Mar 23 '25 14:03 BetterAndBetterII

I think to be consistent with pptx etc., the request is to have the images get automatically captioned (either with the alt-text from the Word doc itself, or LLM-generated).

This is indeed a discrepancy, and I will work to address it in a future PR.

Mar 23 '25 17:03 afourney

would recommend also allow to have <--- image -1 ---> with base64 code or URL to path , image place holders like Docling in converted markdown, so that at least we can apply vision LLM to get caption by reading information from these image place holders. Thanks

Apr 07 '25 20:04 klynwuu

What’s the status on this? I’m also not able get image captioning using LLM for docx files.

Apr 18 '25 04:04 hxk1633

Any updates so far? Automated image captioning on .docx, .pdf, would certainly be useful.

Sep 19 '25 14:09 edwin-mui