markitdown
markitdown copied to clipboard
UnicodeEncodeError: 'gbk' codec can't encode character '\u2009' in position 390: illegal multibyte sequence
UnicodeEncodeError: 'gbk' codec can't encode character '\u2009' in position 390: illegal multibyte sequence
same issue
It seems that Issue https://github.com/microsoft/markitdown/issues/198 has solved this problem. If you still encouner this problem, below is my solution:
- Write a .py file (let's say it is
converter.py)
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert("your-file-name")
print(result.text_content)
- move the output text into a markdown file.
$ python convertor.py > your-file-name.md
That's worked for me.
If above code do not work, you can try below:
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert("your-file-name.pptx")
text = result.text_content
# Save to a file
with open("your-file-name.md", "w", encoding="utf-8") as f:
f.write(text)
# Print to the console
# print(text)
This works for me:
markitdown < 1.pdf