py-tree-sitter icon indicating copy to clipboard operation
py-tree-sitter copied to clipboard

Possible memory leak when accessing "text" variable of Nodes

Open Kleinkop opened this issue 3 years ago • 4 comments

Version 0.20.0, Python 3.8 on Ubuntu 20.04.4

I use tree-sitter to parse a lot of python files during which I've found myself running out of memory trying to traverse the parsed trees:

re = []
nodes_to_expand: List[Node] = [root_node]
while nodes_to_expand:
    node = nodes_to_expand.pop()
        re.append(node.text.decode())
        for child in node.children:
            nodes_to_expand.append(child)

Replacing

re.append(node.text.decode())

with

re.append(blob[node.start_byte:node.end_byte].decode())

fixed the issue for me, which leads me to believe that repeated usage of node.text might lead to leaked memory

Kleinkop avatar Mar 31 '22 13:03 Kleinkop

fixed in e973edc

lunixbochs avatar Mar 31 '22 15:03 lunixbochs

⚡ nice work @lunixbochs

maxbrunsfeld avatar Mar 31 '22 15:03 maxbrunsfeld

Also thanks for the good report @Kleinkop

maxbrunsfeld avatar Mar 31 '22 15:03 maxbrunsfeld

@maxbrunsfeld Would you mind pushing this fix to PyPI? I think it's an important fix.

grotrek avatar Apr 20 '22 16:04 grotrek