markitdown icon indicating copy to clipboard operation
markitdown copied to clipboard

feat: Add DOC file support

Open ashmod opened this issue 5 months ago • 8 comments

Summary

Adds support for legacy Microsoft Word DOC files (.doc) to MarkItDown.

Implementation Details

I could not find an out-of-the-box library to do doc to md conversion, so I went with a 2-step approach, converting the doc to docx then converting the docx using the converter module to md. The minor issue here is the dependencies, all libraries require some sort of dependency (usually Libreoffice), I implemented an OS-specific approach that checks if the user is on Linux, it uses the Libreoffice cli tool, but, on Windows it would use MS Word's COM interface, this is to eliminate the need to install external dependencies as much as possible.

Testing

  • All existing tests pass
  • DocConverter properly registered and accepts DOC files, correctly parses content. (Testing passed on Linux & Windows)

Fixes #23, #1220

ashmod avatar Jul 08 '25 11:07 ashmod

really need it

BetterAndBetterII avatar Jul 14 '25 13:07 BetterAndBetterII

I really need this doc conversion!!!

keller31 avatar Aug 19 '25 09:08 keller31

+1 we need it

HOUTASU avatar Aug 20 '25 08:08 HOUTASU

+1 on needing this doc change, it will resolve a lot of problems :pray:

SystemAgent avatar Aug 20 '25 08:08 SystemAgent

+1 the PR is open from a long time. Can't we merge it ?

JonLev avatar Oct 07 '25 09:10 JonLev

+1 really needed

vkavalerov avatar Oct 20 '25 16:10 vkavalerov

+1 would be great if this could get merged.

Xirider avatar Nov 06 '25 16:11 Xirider

+1 Please, we yearn for this

jsun-m avatar Nov 06 '25 20:11 jsun-m

+1 please, would make it one step easier for us to use this package

bencegadanyi1-nhs avatar Dec 04 '25 16:12 bencegadanyi1-nhs