markitdown Images in docx files cannot be converted to md documents

The images in the document are converted into codes similar to the following, but they are incomplete and lack base64 content. ![](data:image/jpeg;base64...)

Apr 28 '25 07:04 keller31

After reading some documents, I found a solution. Using the keep_data_uris parameter allows md to retain the base64 content of the image.

Apr 28 '25 07:04 keller31

example: markitdown xxx.docx > xxx.md --keep-data-uris

Apr 28 '25 07:04 keller31

there is pr https://github.com/microsoft/markitdown/pull/277 looking to address this. I'm keen to get some code in to merge this functionality; it seems pretty important to me. Will try and have a look at getting code in for this this week; if you can provide any further review on that pr #277, i'll try and fork and address issues.

Apr 28 '25 11:04 joshjm

Personally, i have a post processing step, that greps through for the base64 data, generates a description, then replaces the binary data with the description. its a little fast and loose right now, but has potential.

Apr 28 '25 11:04 joshjm