markitdown icon indicating copy to clipboard operation
markitdown copied to clipboard

Support for .doc extensions

Open mictab opened this issue 1 year ago • 8 comments

Are you planning to offer .doc support in addition to the current .docx support?

mictab avatar Dec 14 '24 14:12 mictab

created PR https://github.com/microsoft/markitdown/pull/36

aviral-bhardwaj avatar Dec 15 '24 14:12 aviral-bhardwaj

Patiently waiting for good news.

wfnian avatar Dec 19 '24 07:12 wfnian

Just wanted to comment, I ran a check of file counts across our server, where I was considering using markitdown:

  1. PDF - 1,150k files
  2. JPG - 243k files
  3. DOC - 112k files
  4. XLS - 70k files
  5. XLSX - 69k files
  6. DOCX - 46k files

I think this demonstrates just how important DOC file handling is, even in 2025.

On a side note, what's not shown in the above list is the number of PDFs that are generated content versus scanned documents. Most are probably scanned, and 95% of the scanned PDFs should be searchable images.

Other related PRs / discussions not already linked above:

  • #281
  • #335
  • #1220

Shane32 avatar Apr 30 '25 02:04 Shane32

raised PR #1316

ashmod avatar Jul 08 '25 11:07 ashmod

Hi, I'm interested in working on this issue if it's still open. I'm new to open source and would appreciate any guidance.

Chandu378 avatar Jul 19 '25 07:07 Chandu378

I saw that some people have already made an effort and developed this functionality, which shows its importance to the community. I'd like to better understand what the assumptions are after the PR and why there's been a delay in providing support for .doc files.

tifilipebr avatar Jul 19 '25 22:07 tifilipebr

I've recently tested this, #1316 , works well on both linux and windows with no need for third-party installations (provided MS Office is installed on the Windows client).

ashmod avatar Aug 01 '25 17:08 ashmod

The open-source LibreOffice has a CLI converter that also works for legacy .doc files. A bit bulky but doesn't require MS Office license and can be setup on linux machines.

pazars avatar Sep 19 '25 05:09 pazars