markitdown
markitdown copied to clipboard
Python tool for converting files and office documents to Markdown.
When trying to convert a German pdf. I get this Error: `UnicodeEncodeError: 'charmap' codec can't encode character '\u2212' in position 15215: character maps to `
Cannot figure out why this is failing and nothing is, it's driving me crazy. UPDATE: To be fair I only tried the xls from tests, but that should work. Images,...
Using `pymupdf4llm` instead of `pdfminer` to parse pdf contents into markdown formats, as suggested by #131. Pros and Cons: - `pdfminer` extract texts only, generated files have no heading, titles,...
My take on how this code should support async, given all underlying libraries are not supporting async. For more details/context, see: https://github.com/microsoft/markitdown/issues/13#issuecomment-2543834157
Add json
the proposed PR addresses somehow issue #34. Having not found a suitable python library, I added a JsonConverter class independent of the PlainTextConverter. in a nutshell : - parse document...
I tried to extract the contents of pdf. But it is extracting as plain text, not as markdown. Am I missing any parameter? from markitdown import MarkItDown md = MarkItDown()...
When I used Markitdown to parse an XLSX file with a size of 80 megabytes (containing over 4.6 million rows of data), the program ran for eight hours and then...
 在执行批量读xlsx时,遇到损坏文件直接会中止程序。原因是这里没有raise异常。我的外层无法捕获到异常。
Hi there, it does not work with persian language Results in: [markdown_test.md](https://github.com/user-attachments/files/18279834/%2B.%2B.%2B.%2B.%2B.%2B.%2B.%2B.md)
> > [@gagb](https://github.com/gagb) Would be great to have this as an example in the README! Thanks. > > Agreed. IMO, a PDF based example would be best where it is...