Extraction is not in markdown
I tried to extract the contents of pdf. But it is extracting as plain text, not as markdown. Am I missing any parameter?
from markitdown import MarkItDown md = MarkItDown()
result = md.convert("microsoft_report.pdf") print(result.text_content)
output_file = "output.md"
with open(output_file, "w", encoding="utf-8") as file:
file.write(result.text_content)
print(f"Markdown content has been written to {output_file}")
Dealing with the exact same issue. It doesn't convert even the most basic pdf to markdown. It just outputs plain text.
Same for us. It produces just a plain text.
Same here
Same here, I am using Ubuntu 24.04, Python 3.12.3, and install markitdown with virtualenv, anything wrong?
see here the reason https://github.com/microsoft/markitdown/issues/131