feat/Markdown Extraction
JSON Extraction I want to use the Unstructured extraction to feed an LLM directly without losing all the metadata (from partitioning), but the JSON format is not recommended as an input for LLM.
MARKDOWN Extraction Have the possibility to choose the extraction format either JSON or MARKDOWN (with all elements in the format keeping the semantic structure of the document) OR have a function "convert_to_markdown".
JSON to MARKDOWN custom conversion I need to code it.
Additional context See pymuPDF as a benchmark.
I want to second this feature request.
I would like to see a function, which converts the output (elements) into markdown. This would help a lot to understand if the conversion from the original file was correct or if there have been errors during the conversion. At the moment, I have to click through a list of elements, which is very inconvenient for debugging.
I also second this feature!
This feature would be really helpful
this would be very helpful if we can extract data in markdown from any file, pdf to markdown or docx to markdown.
This would be really helpful. I would like to see this function and use on my project.