unstructured feat/Markdown Extraction

JSON Extraction I want to use the Unstructured extraction to feed an LLM directly without losing all the metadata (from partitioning), but the JSON format is not recommended as an input for LLM.

MARKDOWN Extraction Have the possibility to choose the extraction format either JSON or MARKDOWN (with all elements in the format keeping the semantic structure of the document) OR have a function "convert_to_markdown".

JSON to MARKDOWN custom conversion I need to code it.

Additional context See pymuPDF as a benchmark.

Aug 15 '24 07:08 pat-ben

I want to second this feature request.

I would like to see a function, which converts the output (elements) into markdown. This would help a lot to understand if the conversion from the original file was correct or if there have been errors during the conversion. At the moment, I have to click through a list of elements, which is very inconvenient for debugging.

Jan 27 '25 18:01 Netzeband

I also second this feature!

Feb 18 '25 17:02 mishoco

This feature would be really helpful

Mar 18 '25 08:03 myteberib

this would be very helpful if we can extract data in markdown from any file, pdf to markdown or docx to markdown.

Sep 23 '25 00:09 raviranjan31

This would be really helpful. I would like to see this function and use on my project.

Dec 10 '25 13:12 Emanuellefelix