Segfault when sorting captures by `start_point`
It is possible to crash tree-sitter when many captures are sorted.
Valgrind output;:
==7579== Invalid read of size 4
==7579== at 0x75BA4D8: ts_query_end_byte_for_pattern (query.c:2887)
==7579== by 0x75A54D4: query_end_byte_for_pattern (query.c:699)
==7579== by 0x4A5B48D: method_vectorcall_VARARGS (descrobject.c:324)
==7579== by 0x49A9DD6: UnknownInlinedFun (pycore_call.h:168)
==7579== by 0x49A9DD6: PyObject_Vectorcall (call.c:327)
==7579== by 0x49BA14E: _PyEval_EvalFrameDefault (generated_cases.c.h:1843)
==7579== by 0x4A146B0: UnknownInlinedFun (pycore_ceval.h:119)
==7579== by 0x4A146B0: UnknownInlinedFun (ceval.c:1816)
==7579== by 0x4A146B0: UnknownInlinedFun (call.c:413)
==7579== by 0x4A146B0: UnknownInlinedFun (pycore_call.h:168)
==7579== by 0x4A146B0: method_vectorcall (classobject.c:62)
==7579== by 0x4A99B37: UnknownInlinedFun (call.c:285)
==7579== by 0x4A99B37: _PyObject_Call (call.c:348)
==7579== by 0x49BE5F2: UnknownInlinedFun (call.c:373)
==7579== by 0x49BE5F2: UnknownInlinedFun (call.c:381)
==7579== by 0x49BE5F2: _PyEval_EvalFrameDefault (generated_cases.c.h:1355)
==7579== by 0x4A8C03A: PyEval_EvalCode (ceval.c:604)
==7579== by 0x4ACAD22: run_eval_code_obj (pythonrun.c:1381)
==7579== by 0x4AC8342: run_mod (pythonrun.c:1466)
==7579== by 0x4AC4DD5: pyrun_file (pythonrun.c:1295)
==7579== Address 0x621e33c is 39,372 bytes inside an unallocated block of size 145,536 in arena "client"
==7579==
The code which triggers it:
def query_captures_22_3(query: Query, node: Node) -> list[tuple[Node, str]]:
result = list()
captures = query.captures(node)
captures_sorted = dict()
# Commenting out these lines will prevent the segfault
# FAULTY BEGIN
nodes: list[Node]
for name, nodes in captures.items():
captures_sorted[name] = sorted(nodes, key=lambda n: n.start_point)
# FAULTY END
while len(captures_sorted) != 0:
for name, nodes in captures_sorted.items():
node = nodes.pop(0)
result.append((node, name))
captures_sorted = {k: l for k, l in captures_sorted.items() if len(l) != 0}
return result
It is somewhat random. It segfaults most of the time but not always. Something like 8/10.
Number of nodes sorted in the captures dict are up to 555. But when it segfaults it always does after sorting the same captures and after the function returned.
For whatever reason I can't reproduce it when the script is run in pycharm. Only from the command line.
@ObserverOfTime Is there any chance you find time fixing this soon? If not it is ok, but I'd like to plan for a work-around then. Because the v22 version no longer builds with Python3.13.
That version is not supported. Does the crash occur in the latest version and/or master branch?
That version is not supported. Does the crash occur in the latest version and/or master branch?
The crash occurs with the latest release (24.0) and is used with Python ~~3.13~~ 3.12
Can you share a file and query that results in the crash?
Sorry, should have given you a minimal reproducible example all along. It will take a little to isolate the code from our tool. Will report back soon.
Done: https://github.com/Rot127/ts-py-debug Note that it only crashes reliably on 3.12. Not on 3.13 as I said before. I had the wrong venv enabled I think.
@ObserverOfTime have you had a chance to look at this issue?
@Rot127 The issue is that you're using the byte offset instead of the pattern index. https://github.com/Rot127/ts-py-debug/blob/main/patches/StreamOperation.py#L118
Fixed the crash by raising an IndexError if the supplied number exceeds the pattern count.
@ObserverOfTime Thanks a lot!