markitdown issues

When trying to convert a German pdf. I get this Error:

1

When trying to convert a German pdf. I get this Error: `UnicodeEncodeError: 'charmap' codec can't encode character '\u2212' in position 15215: character maps to `

tuskin40

The xls from test/ is failing, no others are

Cannot figure out why this is failing and nothing is, it's driving me crazy. UPDATE: To be fair I only tried the xls from tests, but that should work. Images,...

markthepixel

update: change pdf text parser to pymupdf4llm

13

Using `pymupdf4llm` instead of `pdfminer` to parse pdf contents into markdown formats, as suggested by #131. Pros and Cons: - `pdfminer` extract texts only, generated files have no heading, titles,...

tungsten106

Add AsyncMarkItDown as a wrapper

9

My take on how this code should support async, given all underlying libraries are not supporting async. For more details/context, see: https://github.com/microsoft/markitdown/issues/13#issuecomment-2543834157

0xRaduan

awaiting op response

Add json

5

the proposed PR addresses somehow issue #34. Having not found a suitable python library, I added a JsonConverter class independent of the PlainTextConverter. in a nutshell : - parse document...

Gad

Extraction is not in markdown

5

I tried to extract the contents of pdf. But it is extracting as plain text, not as markdown. Am I missing any parameter? from markitdown import MarkItDown md = MarkItDown()...

harinisri2001

The program has become unresponsive.

2

When I used Markitdown to parse an XLSX file with a size of 80 megabytes (containing over 4.6 million rows of data), the program ran for eight hours and then...

MoonS11

是否考虑将_markitdown.py的异常进行raise

1

![Image](https://github.com/user-attachments/assets/6c194d40-2ada-40a2-8939-c1004e2630eb) 在执行批量读xlsx时，遇到损坏文件直接会中止程序。原因是这里没有raise异常。我的外层无法捕获到异常。

cyKron613

Does not work with persian documents

1

Hi there, it does not work with persian language Results in: [markdown_test.md](https://github.com/user-attachments/files/18279834/%2B.%2B.%2B.%2B.%2B.%2B.%2B.%2B.md)

arasrezaei

Create advanced PDF convertor

3

> > [@gagb](https://github.com/gagb) Would be great to have this as an example in the README! Thanks. > > Agreed. IMO, a PDF based example would be best where it is...

gagb

enhancement

open for contribution

markitdown
markitdown copied to clipboard

Metadata

When trying to convert a German pdf. I get this Error:

The xls from test/ is failing, no others are

update: change pdf text parser to pymupdf4llm

Add AsyncMarkItDown as a wrapper

Add json

Extraction is not in markdown

The program has become unresponsive.

是否考虑将_markitdown.py的异常进行raise

Does not work with persian documents

Create advanced PDF convertor

← Metadata

Owner

Metadata

markitdown markitdown copied to clipboard

Metadata

← Metadata

Owner

Metadata

markitdown
markitdown copied to clipboard