py-tree-sitter
py-tree-sitter copied to clipboard
Possible memory leak when accessing "text" variable of Nodes
Version 0.20.0, Python 3.8 on Ubuntu 20.04.4
I use tree-sitter to parse a lot of python files during which I've found myself running out of memory trying to traverse the parsed trees:
re = []
nodes_to_expand: List[Node] = [root_node]
while nodes_to_expand:
node = nodes_to_expand.pop()
re.append(node.text.decode())
for child in node.children:
nodes_to_expand.append(child)
Replacing
re.append(node.text.decode())
with
re.append(blob[node.start_byte:node.end_byte].decode())
fixed the issue for me, which leads me to believe that repeated usage of node.text might lead to leaked memory
fixed in e973edc
⚡ nice work @lunixbochs
Also thanks for the good report @Kleinkop
@maxbrunsfeld Would you mind pushing this fix to PyPI? I think it's an important fix.