markitdown icon indicating copy to clipboard operation
markitdown copied to clipboard

[Feature] Add HTML content string support to converter

Open hyuri opened this issue 7 months ago • 2 comments

Currently, MarkItDown.convert's source argument only accepts path (Path | str), url or requests.response object.

I'm building a cross-platform app, and due to file system access restrictions imposed by Android (v10+), I don't have access to file paths that point to files outside of my app's internal directory tree — it's sandboxed. But I can use native APIs to read the contents of the file as a string and pass it to MarkItDown.convert.

It would be excellent if either MarkItDown.convert's source argument accepted HTML content as a string, or we had a separate method, such as converts — for "convert string" — just for that.

Similar to how we have json.load(), that takes a path to a json file, and json.loads(), that takes json content as a string.

hyuri avatar May 01 '25 09:05 hyuri

Try the code below

import io
from markitdown import MarkItDown
md = MarkItDown(enable_plugins=False)
stream = io.BytesIO(json.loads(json_data.encode('utf-8')))
result = md.convert(stream).text_content

simonxiao86 avatar May 03 '25 07:05 simonxiao86

That works. Thank you!

I'd still leave the ticket open because the dedicated convenience function still makes sense to me, especially considering we have a similar pattern with the built-in parsers (like json and xml).

But let me know.

hyuri avatar May 04 '25 14:05 hyuri