unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

feat/Markdown Extraction

Open pat-ben opened this issue 1 year ago • 4 comments

JSON Extraction I want to use the Unstructured extraction to feed an LLM directly without losing all the metadata (from partitioning), but the JSON format is not recommended as an input for LLM.

MARKDOWN Extraction Have the possibility to choose the extraction format either JSON or MARKDOWN (with all elements in the format keeping the semantic structure of the document) OR have a function "convert_to_markdown".

JSON to MARKDOWN custom conversion I need to code it.

Additional context See pymuPDF as a benchmark.

pat-ben avatar Aug 15 '24 07:08 pat-ben

I want to second this feature request.

I would like to see a function, which converts the output (elements) into markdown. This would help a lot to understand if the conversion from the original file was correct or if there have been errors during the conversion. At the moment, I have to click through a list of elements, which is very inconvenient for debugging.

Netzeband avatar Jan 27 '25 18:01 Netzeband

I also second this feature!

mishoco avatar Feb 18 '25 17:02 mishoco

This feature would be really helpful

myteberib avatar Mar 18 '25 08:03 myteberib

this would be very helpful if we can extract data in markdown from any file, pdf to markdown or docx to markdown.

raviranjan31 avatar Sep 23 '25 00:09 raviranjan31

This would be really helpful. I would like to see this function and use on my project.

Emanuellefelix avatar Dec 10 '25 13:12 Emanuellefelix