lark icon indicating copy to clipboard operation
lark copied to clipboard

How can I convert the output of lark().parse() into JSON

Open ghost opened this issue 5 years ago • 4 comments
trafficstars

What is your question?

I have a working grammar which I can parse into an abstract syntax tree. I can output it to the console as an object, as a "pretty" version or to a png picture. Is it possible to convert this into a JSON representation so that I can use the AST in another program?

If you're having trouble with your code or grammar

with open('grammar.lark', 'r') as grammar:
    parser = Lark(grammar, start='start')

with open('vegancupcakes.json', 'r') as recipe:
        for i in json.load(recipe):
            print(parser.parse(i).pretty())
            tree.pydot__tree_to_png(parser.parse(input), 'ast.png')

I tried to look in the documentation and in the source code but I wasn't able to find if the tree class has any other methods. Sorry if it's there but I was unable to find it. What I'm looking for is a method like tree.pydot__tree_to_png() but to JSON instead of png. Failing that, is there another library I can use or an alternative way of achieving the same thing?

Thank you

ghost avatar Jul 12 '20 16:07 ghost

Is there a problem with creating your own method? It is not very complex.

MegaIng avatar Jul 12 '20 18:07 MegaIng

@leliamesteban You should implement a Transformer that turns the tree into JSON. There are no built-in methods for that in Lark.

erezsh avatar Jul 16 '20 19:07 erezsh

I have created a standalone example containing functions tree_to_json_str and tree_to_json: https://gist.github.com/charles-esterbrook/9ab557d70391fd85ebac2b1a59a326cf

You can adapt to your needs. Another approach would be to convert the tree to Python data and then use json.dump/dumps.

This ticket can be closed.

charles-esterbrook avatar Nov 14 '20 14:11 charles-esterbrook

Not the question asked, but after having used the script of @charles-esterbrook I realised that the YAML format may be a better fit for an AST. The code is cleaner, too:

def ast_to_yaml(node, indent=""):
    indent += "  "
    if isinstance(node, Token):
        yield f"{indent}- type: {node.type}"
        yield f"{indent}  value: {repr(node.value)}"
    else:
        yield f"{indent}- type: {node.data}"
        yield f"{indent}  children:"
        for child in node.children:
            yield from ast_to_yaml(child, indent)

Example of output:

- type: start
    children:
    - type: line
      children:
      - type: entity_clause
        children:
        - type: entity_name_def
          children:
          - type: box_name
            children:
            - type: BOX_NAME
              value: 'FOO'
        - type: COLON
          value: ': '
        - type: seq
          children:
          - type: entity_or_table_attr
            children:
            - type: typed_attr
              children:
              - type: attr
                children:
                - type: ATTR
                  value: 'bar'
        - type: NL
          value: '\n'
    - type: line
      children:
      - type: NL
        value: '\n'

laowantong avatar Sep 18 '23 11:09 laowantong