feat: Add 'convert_local_content' method to directly convert file content (str)
Hello, maybe I didn't find it, but I couldn't figure out how to directly convert the content of a file (str) into markdown. This PR contains the unit tests for this method
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert_local_content("<h1>Hello World!</h1>", file_extension=".html")
print(result.text_content)
-->
# Hello World!
@microsoft-github-policy-service agree
MarkItDown deals with byte streams. You can get the same behavior by doing:
import io
input_data = b"<html><body><h1>Test</h1></body></html>"
result = markitdown.convert_stream(io.BytesIO(input_data), file_extension=".html")
If it's a string, then perhaps:
import io
input_data = "<html><body><h1>Test</h1></body></html>".encode("utf-8")
result = markitdown.convert_stream(io.BytesIO(input_data), file_extension=".html")
If this is a common enough pattern, I could imagine creating a convenience method. Perhaps convert_string, but would prefer the more explicit approach above rather than adding a new entry-point to maintain. Please let me know what you think.
I thinks in spiders this pattern is common, it will grab the article content in html and convert it to markdown.