kotaemon
kotaemon copied to clipboard
[REQUEST] markitdown for file parsing
Reference Issues
No response
Summary
This convert all files into Markdown format. https://github.com/microsoft/markitdown Maybe it can act as a pre-processor and then docling takes as input the markdown instead of the original? It should be a user option.
Basic Example
from markitdown import MarkItDown
md = MarkItDown() result = md.convert("test.xlsx") print(result.text_content)
Drawbacks
New library just published 3 weeks ago.
Additional information
No response