Extraction is not in markdown

Open harinisri2001 opened this issue 1 year ago • 5 comments

I tried to extract the contents of pdf. But it is extracting as plain text, not as markdown. Am I missing any parameter?

from markitdown import MarkItDown md = MarkItDown()

result = md.convert("microsoft_report.pdf") print(result.text_content)

output_file = "output.md" with open(output_file, "w", encoding="utf-8") as file: file.write(result.text_content) print(f"Markdown content has been written to {output_file}")

Dec 24 '24 09:12 harinisri2001

Dealing with the exact same issue. It doesn't convert even the most basic pdf to markdown. It just outputs plain text.

Dec 31 '24 00:12 sakariye

Same for us. It produces just a plain text.

Jan 03 '25 15:01 demirag

Same here

Jan 06 '25 13:01 anerathil

Same here, I am using Ubuntu 24.04, Python 3.12.3, and install markitdown with virtualenv, anything wrong?

Jan 08 '25 08:01 zsimple

see here the reason https://github.com/microsoft/markitdown/issues/131

Jan 13 '25 09:01 huineng