[Feature] Add HTML content string support to converter
Currently, MarkItDown.convert's source argument only accepts path (Path | str), url or requests.response object.
I'm building a cross-platform app, and due to file system access restrictions imposed by Android (v10+), I don't have access to file paths that point to files outside of my app's internal directory tree — it's sandboxed. But I can use native APIs to read the contents of the file as a string and pass it to MarkItDown.convert.
It would be excellent if either MarkItDown.convert's source argument accepted HTML content as a string, or we had a separate method, such as converts — for "convert string" — just for that.
Similar to how we have json.load(), that takes a path to a json file, and json.loads(), that takes json content as a string.
Try the code below
import io
from markitdown import MarkItDown
md = MarkItDown(enable_plugins=False)
stream = io.BytesIO(json.loads(json_data.encode('utf-8')))
result = md.convert(stream).text_content
That works. Thank you!
I'd still leave the ticket open because the dedicated convenience function still makes sense to me, especially considering we have a similar pattern with the built-in parsers (like json and xml).
But let me know.