bug: docx not work
also can't parse doc file
I want to contribute to this, can I?
I'm having same issue
Can you Provide Docs you Have tested with....
I just got markitdown and tried it with a file and got:
[rashino@archrailgun Downloads]$ markitdown Refined\ Homelab\ Service\ Metaplan_.docx
Traceback (most recent call last):
File "/usr/bin/markitdown", line 8, in <module>
sys.exit(main())
~~~~^^
File "/usr/lib/python3.13/site-packages/markitdown/__main__.py", line 197, in main
result = markitdown.convert(
args.filename, stream_info=stream_info, keep_data_uris=args.keep_data_uris
)
File "/usr/lib/python3.13/site-packages/markitdown/_markitdown.py", line 260, in convert
return self.convert_local(source, stream_info=stream_info, **kwargs)
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.13/site-packages/markitdown/_markitdown.py", line 311, in convert_local
guesses = self._get_stream_info_guesses(
file_stream=fh, base_guess=base_guess
)
File "/usr/lib/python3.13/site-packages/markitdown/_markitdown.py", line 675, in _get_stream_info_guesses
result = self._magika.identify_stream(file_stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Magika' object has no attribute 'identify_stream'. Did you mean: 'identify_bytes'?
[rashino@archrailgun Downloads]$
Work On This Doc - https://calibre-ebook.com/downloads/demos/demo.docx
@dev4mobile - your error references w:ilvl which looks like a numbered list that is not proper defined.
也无法解析 doc 文件
so,how to fix it, i have a same problem