markitdown icon indicating copy to clipboard operation
markitdown copied to clipboard

How get image captioning in docx files?

Open DmitryDiTy opened this issue 9 months ago • 6 comments

Hey, I tried to convert docx with images file to md, but It does not do captioning:

from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")  # my local VLLM host
md = MarkItDown(llm_client=client, llm_model="microsoft/Phi-3.5-vision-instruct")

result = md.convert("file.docx")
print(result.text_content)
# .... ![](data:image/png;base64...) ....

What did I do wrong?

Thank you in advance for your reply!

DmitryDiTy avatar Mar 07 '25 09:03 DmitryDiTy

Same issue here. A bug maybe, for .pptx or .jpg it works well.

tookdes avatar Mar 16 '25 06:03 tookdes

https://github.com/microsoft/markitdown/pull/1140 It has supported passing parameters keep_data_uri to preserve image information

@afourney

BetterAndBetterII avatar Mar 23 '25 14:03 BetterAndBetterII

I think to be consistent with pptx etc., the request is to have the images get automatically captioned (either with the alt-text from the Word doc itself, or LLM-generated).

This is indeed a discrepancy, and I will work to address it in a future PR.

afourney avatar Mar 23 '25 17:03 afourney

would recommend also allow to have <--- image -1 ---> with base64 code or URL to path , image place holders like Docling in converted markdown, so that at least we can apply vision LLM to get caption by reading information from these image place holders. Thanks

klynwuu avatar Apr 07 '25 20:04 klynwuu

What’s the status on this? I’m also not able get image captioning using LLM for docx files.

hxk1633 avatar Apr 18 '25 04:04 hxk1633

Any updates so far? Automated image captioning on .docx, .pdf, would certainly be useful.

edwin-mui avatar Sep 19 '25 14:09 edwin-mui