mistune icon indicating copy to clipboard operation
mistune copied to clipboard

AST is incomplete?

Open cellularmitosis opened this issue 6 years ago • 4 comments

Hello,

I was interested in getting the AST from mistune so that I could manipulate it, then have mistune render the modified AST.

I created a couple of functions to do this:

#!/usr/bin/env python3

import mistune
assert mistune.__version__ == '2.0.0a1'

def md_to_ast(md_text):
    return mistune.markdown(md_text, renderer='ast')

def ast_to_html(ast):
    r = mistune.HTMLRenderer()
    md = mistune.Markdown(r)
    def inject_ast(self, tokens, _):
        return ast
    md.before_render_hooks.append(inject_ast)
    html = md.parse('')
    return html

This works for simple cases:

assert mistune.markdown('hello') == ast_to_html(md_to_ast('hello'))
assert mistune.markdown('hello *world*') == ast_to_html(md_to_ast('hello *world*'))

but falls down in other cases:

assert mistune.markdown('# title\nfoo') == ast_to_html(md_to_ast('# title\nfoo'))
Traceback (most recent call last):
  File "./test.py", line 20, in <module>
    assert mistune.markdown('# title\nfoo') == ast_to_html(md_to_ast('# title\nfoo'))
  File "./test.py", line 15, in ast_to_html
    html = md.parse('')
  File "/home/cell/github/cellularmitosis/junk/retainn/mistune-v2/mistune/markdown.py", line 54, in parse
    result = self.block.render(tokens, self.inline, state)
  File "/home/cell/github/cellularmitosis/junk/retainn/mistune-v2/mistune/block_parser.py", line 246, in render
    return ''.join(data)
  File "/home/cell/github/cellularmitosis/junk/retainn/mistune-v2/mistune/block_parser.py", line 265, in _iter_render
    yield method(children)
TypeError: heading() missing 1 required positional argument: 'level'

It looks like the AstRenderer might be dropping some information?

Ideally, it would be great to have a markdown -> AST -> markdown workflow supported.

cellularmitosis avatar Feb 08 '20 21:02 cellularmitosis

I had also tried getting direct access to the AST, but that didn't seem to produce correct AST's (i.e., hello *world* didn't even recognize the emphasis).

def md_to_ast(md_text):
    r = mistune.AstRenderer()
    md = mistune.Markdown(r)
    ast = None
    def grab_ast(self, tokens, _):
        nonlocal ast
        ast = copy.copy(tokens)
        return tokens
    md.before_render_hooks.append(grab_ast)
    _ = md.parse(md_text)
    return ast

cellularmitosis avatar Feb 08 '20 21:02 cellularmitosis

You had a misunderstanding of how mistune works. It doesn't work in your text -> ast -> html way. The AstRenderer was used to print out the structure of the ast, but it is not meant to be used by HTMLRenderer.

lepture avatar Feb 09 '20 14:02 lepture

@lepture I had a similar idea using markdown -> AST -> markdown. I would like to use the AST to search for something in the markdown, and after finding it, returning parts of the original markdown. Therefore, it would help a lot to convert (parts of) the AST back to markdown again. Do you have a suggestion of how to accomplish that?

soerface avatar Feb 24 '20 02:02 soerface

@lepture I had a similar idea using markdown -> AST -> markdown. I would like to use the AST to search for something in the markdown, and after finding it, returning parts of the original markdown. Therefore, it would help a lot to convert (parts of) the AST back to markdown again. Do you have a suggestion of how to accomplish that?

Been looking for something like this too for a while and was happy when I found out mistune have MD -> AST but then I couldn't find a way of going AST -> MD again. Is this a planned feature or maybe there's a way of doing this?

On another note: Would be nice if the AST was tree based instead of a long list of objects and had a API for searching the tree. So I could for instance easily grab the content of "header1/subheader2".

My use-case is that I have a document and I need to extract everything within a specific section denoted by header level (e.g. I'd like to grab the content of "header1/subheader2" and output this in a new markdown file)

sunflowr avatar Apr 24 '20 23:04 sunflowr

In v3, without a renderer, it will output AST as default. Fixed in v3.

lepture avatar Sep 26 '22 14:09 lepture

I couldn't find any docs on this, but here's how to do it in v3:

"""
pip install mistune==3.0.0rc4
"""
import mistune
from mistune.renderers.markdown import MarkdownRenderer
from mistune.core import BlockState

markdown = mistune.create_markdown(renderer=None)

text = """

# heading

- item 1
- item 2
"""

tokens = markdown(text)

tokens[1]["children"][0]["raw"] = "new heading"
tokens[3]["bullet"] = "*"

renderer = MarkdownRenderer()

print(renderer(tokens, state=BlockState()))

edublancas avatar Jan 14 '23 22:01 edublancas