Any idea to revert (convert back) a markdown file to the original file.

For example, given a Docx -> MarkedDown-File -> Filled by LLM -> MarkedDown-File -> Docx

Jul 16 '25 19:07 Jeremaiha

Thanks @Jeremaiha for the question. Just to clarify, if we already have the original docx or pdf files, is the goal to convert edited markdown back into those formats after LLM changes? Or is this for workflows where only markdown is saved and the original file isn't kept? Just want to understand the use case better.

Jul 17 '25 09:07 tsvlgd

Thank you @Savvythelegend To clarify, the use-case is such that we have an existing template document. We use your library to generate the markdown, and are able to fill the documents. But we're unable to revert back to the original form of the input document.

Specifically, it's easier if you have the initial template document, together with the markdown document(filled).

Jul 17 '25 17:07 Jeremaiha

👋 Hello! I'd love to work on this feature.
My plan is to build a basic Markdown-to-DOCX converter using python-docx and link it to MarkItDown workflows.
I'll start prototyping and open a draft PR soon. 🚀

Jul 17 '25 21:07 yossefelnggar

Yes please do, that will be highly beneficial

Jul 17 '25 22:07 Jeremaiha

`import markdown from docx import Document from bs4 import BeautifulSoup

def markdown_to_docx(md_text, output_file="output.docx"): html = markdown.markdown(md_text) soup = BeautifulSoup(html, "html.parser") doc = Document()

for el in soup.descendants:
    if el.name == "h1":
        doc.add_heading(el.get_text(), level=1)
    elif el.name == "h2":
        doc.add_heading(el.get_text(), level=2)
    elif el.name == "p":
        doc.add_paragraph(el.get_text())
    elif el.name == "li":
        doc.add_paragraph("• " + el.get_text(), style='ListBullet')

doc.save(output_file)
print(f"✅ تم الحفظ: {output_file}")

`

Jul 18 '25 16:07 yossefelnggar

`from docx import Document import html2text

def docx_to_markdown(input_file="input.docx"): doc = Document(input_file) html = ""

for para in doc.paragraphs:
    style = para.style.name
    text = para.text.strip()
    if not text:
        continue
    if style.startswith("Heading 1"):
        html += f"<h1>{text}</h1>\n"
    elif style.startswith("Heading 2"):
        html += f"<h2>{text}</h2>\n"
    elif style.startswith("Heading 3"):
        html += f"<h3>{text}</h3>\n"
    elif style.startswith("List"):
        html += f"<li>{text}</li>\n"
    else:
        html += f"<p>{text}</p>\n"

markdown_text = html2text.html2text(html)
return markdown_text

`

Jul 18 '25 16:07 yossefelnggar

`# تحويل من Markdown إلى Word markdown_content = """

عنوان

نص تجريبي لتحويل Markdown إلى Word.

عنوان فرعي

عنصر أول
عنصر ثاني """ markdown_to_docx(markdown_content, "من_ماركداون_إلى_وورد.docx")

تحويل من Word إلى Markdown

md_result = docx_to_markdown("من_ماركداون_إلى_وورد.docx") print("✅ Markdown الناتج من الملف:") print(md_result)

`

Jul 18 '25 16:07 yossefelnggar

Hi team 👋

I'm submitting this message to confirm that I’ve implemented the full bi-directional conversion feature: Markdown ↔️ Word (DOCX). Due to a temporary issue while creating the pull request, I’ve provided the full working code and explanation here in the issue for now.

✅ Markdown ➝ Word: using python-docx ✅ Word ➝ Markdown: using html2text + mammoth or equivalent parsing logic

I will finalize the PR once the issue is resolved.

This is the first implementation of its kind and could greatly expand MarkItDown’s capabilities. Let me know if you'd like me to package it as a module.

Thanks 🙏
— @yossefelnggar

Jul 18 '25 16:07 yossefelnggar

Any updates on this feature? Would be useful to have for docs and of course other basic file types as well

Sep 24 '25 20:09 lucasrothman

Can you provide us a snippet for the code? @yossefelnggar

Oct 04 '25 11:10 Jeremaiha

MARKITUP - Revert back from markdown to original document

عنوان

عنوان فرعي

تحويل من Word إلى Markdown