docling icon indicating copy to clipboard operation
docling copied to clipboard

File format not allowed: file.docx

Open SebastianCallh opened this issue 10 months ago • 2 comments

Bug

Trying to use the docling DocumentConverter as in the simple conversion example on a .docx file gives the error docling.exceptions.ConversionError: File format not allowed: file.docx, but from the docling documentation docx should be supported. The files were created in a sharepoint drive using the web interface. Expected behaviour is the script running without errors.

Steps to reproduce

Here is some information about the file and then the repro script

➜ file file.docx
file.docx: Microsoft Word 2007+
➜ file --mime-type -b file.docx
application/vnd.openxmlformats-officedocument.wordprocessingml.document
from docling.document_converter import DocumentConverter

source = "file.docx"
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())

Docling version

Docling version: 2.15.1 Docling Core version: 2.14.0 Docling IBM Models version: 3.1.2 Docling Parse version: 3.0.0

Python version

Python 3.12.7

SebastianCallh avatar Feb 11 '25 08:02 SebastianCallh

@SebastianCallh This could be related to the findings summarized in https://github.com/DS4SD/docling/issues/802. We have it on the radar.

cau-git avatar Feb 12 '25 14:02 cau-git

@cau-git thank you for confirming! I really appreciate your work on docling. Do you have any estimate you can share on when this might be addressed? I am afraid it is a show stopper for us to use docling, but we would really like to.

SebastianCallh avatar Feb 14 '25 10:02 SebastianCallh

+1

file xxx.docx Microsoft OOXML

docling.exceptions.ConversionError: File format not allowed: xxx.docx

whisper-bye avatar May 09 '25 03:05 whisper-bye