py-tree-sitter icon indicating copy to clipboard operation
py-tree-sitter copied to clipboard

Walking Syntax Trees

Open celsofranssa opened this issue 3 years ago • 4 comments

I am currently using the following method to walk the tree:

def traverse(tree):
    def _traverse(node):
        print_node(node)

        for child in node.children:
            _traverse(child)

    _traverse(tree.root_node)

How could someone do the same using TreeCursor?

celsofranssa avatar Sep 09 '20 00:09 celsofranssa

Issue Label Bot is not confident enough to auto-label this issue. See dashboard for more details.

issue-label-bot[bot] avatar Sep 09 '20 00:09 issue-label-bot[bot]

Elsewhere a similar question was asked and I think one of the suggestions was to look at: https://github.com/tree-sitter/tree-sitter/blob/de53b82e2c48d6221af9881e50cc57d961de3365/lib/binding_web/binding.c#L495-L540

The code mentioned above does a bit more than what is asked in this issue, and perhaps the following version is closer to the request:

def make_move(cursor, move, fn):
    # think of the parameter "move" as the move that was made to
    # arrive at the present node
    if (move == "down"):
        fn(cursor)
        if (cursor.goto_first_child()):
            make_move(cursor, "down", fn)
        elif (cursor.goto_next_sibling()):
            make_move(cursor, "right", fn)
        elif (cursor.goto_parent()):
            make_move(cursor, "up", fn)
    elif (move == "right"):
        fn(cursor)
        if (cursor.goto_first_child()):
            make_move(cursor, "down", fn)
        elif (cursor.goto_next_sibling()):
            make_move(cursor, "right", fn)
        elif (cursor.goto_parent()):
            make_move(cursor, "up", fn)
    elif move == "up":
        if (cursor.goto_next_sibling()):
            make_move(cursor, "right", fn)
        elif (cursor.goto_parent()):
            make_move(cursor, "up", fn)

def my_fn(cursor):
    print(cursor.node.type)

tree = parser.parse(bytes("""
(defn hello
  [param]
  (let [x 1]
    (do
      (print "hi")
      (print "ho"))))
""", "utf8"))

cursor = tree.walk()

# as a special case, imagine one starts "above" the root node and
# moves "down" to it
make_move(cursor, "down", my_fn)

The function "make_move" can be "reduced" but I found some of those versions to be less easy to understand.

sogaiu avatar Sep 09 '20 08:09 sogaiu

I think I've got what you're looking for. I got hit by the maximum stack limit when walking the tree using recursion similar to your example, so now rewritten it to use iteration with TreeCursor. I hope it can be some help to others...

def traverse_tree(tree: Tree):
    cursor = tree.walk()

    reached_root = False
    while reached_root == False:
        yield cursor.node

        if cursor.goto_first_child():
            continue

        if cursor.goto_next_sibling():
            continue

        retracing = True
        while retracing:
            if not cursor.goto_parent():
                retracing = False
                reached_root = True

            if cursor.goto_next_sibling():
                retracing = False

for node in traverse_tree:
  print(node)

bencevans avatar Jun 20 '21 13:06 bencevans

I've found this issue when searching for additional information about how to build traversal usint tree-sitter cursor. For anybody looking how to implement Preorder Depth First traversal using tree-sitter cursor in JavaScript: https://github.com/tree-sitter/tree-sitter/issues/1469#issuecomment-981615589

char0n avatar Nov 29 '21 13:11 char0n

https://github.com/tree-sitter/py-tree-sitter/issues/33#issuecomment-864557166

would be nice to have this in the readme

milahu avatar Jun 22 '23 14:06 milahu

Here's a simpler conversion of the above Python into TypeScript:

function* traverseTree(
  tree: Parser.Tree,
): Generator<Parser.SyntaxNode> {
  const cursor = tree.walk();

  let reachedRoot = false;
  while (!reachedRoot) {
    let currentNode = cursor.currentNode as unknown;
    // TreeCursor.currentNode is a property in Node but a function in the browser
    // https://github.com/tree-sitter/tree-sitter/issues/2195
    if (typeof currentNode === "function") {
      currentNode = currentNode();
    }
    yield currentNode as Parser.SyntaxNode;

    if (cursor.gotoFirstChild()) {
      continue;
    }

    if (cursor.gotoNextSibling()) {
      continue;
    }

    let retracing = true;
    while (retracing) {
      if (!cursor.gotoParent()) {
        retracing = false;
        reachedRoot = true;
      }

      if (cursor.gotoNextSibling()) {
        retracing = false;
      }
    }
  }
}

and into JavaScript

function* traverseTree(tree) {
  const cursor = tree.walk();

  let reachedRoot = false;
  while (!reachedRoot) {
    let currentNode = cursor.currentNode;
    // TreeCursor.currentNode is a property in Node but a function in the browser
    // https://github.com/tree-sitter/tree-sitter/issues/2195
    if (typeof currentNode === "function") {
      currentNode = currentNode();
    }
    yield currentNode;

    if (cursor.gotoFirstChild()) {
      continue;
    }

    if (cursor.gotoNextSibling()) {
      continue;
    }

    let retracing = true;
    while (retracing) {
      if (!cursor.gotoParent()) {
        retracing = false;
        reachedRoot = true;
      }

      if (cursor.gotoNextSibling()) {
        retracing = false;
      }
    }
  }
}

verhovsky avatar Jan 22 '24 19:01 verhovsky

closed this as completed

not really completed... the current readme#walking-syntax-trees is not helpful

milahu avatar Feb 26 '24 14:02 milahu

It's more like a not-planned for upstream, this is for the user to implement and not something we want to have/consider

It's useful though so I could move it into discussions instead?

amaanq avatar Feb 26 '24 14:02 amaanq

I could move it into discussions

this should be in the repo, not hidden on github how about examples/walk_tree.py

this is for the user to implement

i see basically 2 strategies for tree walking: by iterating over a generator function like in https://github.com/tree-sitter/py-tree-sitter/issues/33#issuecomment-864557166 by passing a handle_node callback function like in https://github.com/tree-sitter/py-tree-sitter/issues/33#issuecomment-689426763

the iterator/generator pattern may be more efficient because it avoids the function call overhead

... so this is not something that has a million solutions, where users can get creative

the current readme#walking-syntax-trees feels like "this is the API, go figure..." for someone who never had to write a tree-walker that is not helpful

milahu avatar Feb 26 '24 20:02 milahu

The readme does need improvements, as do the core docs.

ObserverOfTime avatar Feb 26 '24 21:02 ObserverOfTime

I like the idea of an examples dir for this, I'll add that

amaanq avatar Mar 05 '24 16:03 amaanq