AST is incomplete?
Hello,
I was interested in getting the AST from mistune so that I could manipulate it, then have mistune render the modified AST.
I created a couple of functions to do this:
#!/usr/bin/env python3
import mistune
assert mistune.__version__ == '2.0.0a1'
def md_to_ast(md_text):
return mistune.markdown(md_text, renderer='ast')
def ast_to_html(ast):
r = mistune.HTMLRenderer()
md = mistune.Markdown(r)
def inject_ast(self, tokens, _):
return ast
md.before_render_hooks.append(inject_ast)
html = md.parse('')
return html
This works for simple cases:
assert mistune.markdown('hello') == ast_to_html(md_to_ast('hello'))
assert mistune.markdown('hello *world*') == ast_to_html(md_to_ast('hello *world*'))
but falls down in other cases:
assert mistune.markdown('# title\nfoo') == ast_to_html(md_to_ast('# title\nfoo'))
Traceback (most recent call last):
File "./test.py", line 20, in <module>
assert mistune.markdown('# title\nfoo') == ast_to_html(md_to_ast('# title\nfoo'))
File "./test.py", line 15, in ast_to_html
html = md.parse('')
File "/home/cell/github/cellularmitosis/junk/retainn/mistune-v2/mistune/markdown.py", line 54, in parse
result = self.block.render(tokens, self.inline, state)
File "/home/cell/github/cellularmitosis/junk/retainn/mistune-v2/mistune/block_parser.py", line 246, in render
return ''.join(data)
File "/home/cell/github/cellularmitosis/junk/retainn/mistune-v2/mistune/block_parser.py", line 265, in _iter_render
yield method(children)
TypeError: heading() missing 1 required positional argument: 'level'
It looks like the AstRenderer might be dropping some information?
Ideally, it would be great to have a markdown -> AST -> markdown workflow supported.
I had also tried getting direct access to the AST, but that didn't seem to produce correct AST's (i.e., hello *world* didn't even recognize the emphasis).
def md_to_ast(md_text):
r = mistune.AstRenderer()
md = mistune.Markdown(r)
ast = None
def grab_ast(self, tokens, _):
nonlocal ast
ast = copy.copy(tokens)
return tokens
md.before_render_hooks.append(grab_ast)
_ = md.parse(md_text)
return ast
You had a misunderstanding of how mistune works. It doesn't work in your text -> ast -> html way. The AstRenderer was used to print out the structure of the ast, but it is not meant to be used by HTMLRenderer.
@lepture I had a similar idea using markdown -> AST -> markdown. I would like to use the AST to search for something in the markdown, and after finding it, returning parts of the original markdown. Therefore, it would help a lot to convert (parts of) the AST back to markdown again. Do you have a suggestion of how to accomplish that?
@lepture I had a similar idea using
markdown->AST->markdown. I would like to use the AST to search for something in the markdown, and after finding it, returning parts of the original markdown. Therefore, it would help a lot to convert (parts of) the AST back to markdown again. Do you have a suggestion of how to accomplish that?
Been looking for something like this too for a while and was happy when I found out mistune have MD -> AST but then I couldn't find a way of going AST -> MD again. Is this a planned feature or maybe there's a way of doing this?
On another note: Would be nice if the AST was tree based instead of a long list of objects and had a API for searching the tree. So I could for instance easily grab the content of "header1/subheader2".
My use-case is that I have a document and I need to extract everything within a specific section denoted by header level (e.g. I'd like to grab the content of "header1/subheader2" and output this in a new markdown file)
In v3, without a renderer, it will output AST as default. Fixed in v3.
I couldn't find any docs on this, but here's how to do it in v3:
"""
pip install mistune==3.0.0rc4
"""
import mistune
from mistune.renderers.markdown import MarkdownRenderer
from mistune.core import BlockState
markdown = mistune.create_markdown(renderer=None)
text = """
# heading
- item 1
- item 2
"""
tokens = markdown(text)
tokens[1]["children"][0]["raw"] = "new heading"
tokens[3]["bullet"] = "*"
renderer = MarkdownRenderer()
print(renderer(tokens, state=BlockState()))