py-spy
py-spy copied to clipboard
py-spy greenlet stacks on Python 3.9
Troublemaker back again! py-spy seems to be the only Python statistical profiler that supports profiling Cython extensions and for that I am very grateful. I am again attempting to profile Cython extensions using the --native option and generally doing fine with the latest versions of everything on Python 3.6:
Cython 0.29.30
gevent 21.12.0
greenlet 1.1.2
py-spy 0.3.12
However, when I try the same on Python 3.9, a really annoying thing happens in greenlets. A side-by-side diff is shown below between two stacks as printed by gdb:
...entries deleted...
0x00007f6f3da8902b in _PyFunction_Vectorcall () from /lib64/libpython3.9.so.1 0x00007f6f3da8902b in _PyFunction_Vectorcall () from /lib64/libpython3.9.so.1
0x00007f6f3d9f408e in PyVectorcall_Call () from /lib64/libpython3.9.so.1.0 0x00007f6f3d9f408e in PyVectorcall_Call () from /lib64/libpython3.9.so.1.0
0x00007f6f2028f2fd in __Pyx_PyObject_Call (kw=, 0x00007f6f2028f2fd in __Pyx_PyObject_Call (kw=,
arg=(<gevent._gevent_cqueue.Queue at remote 0x7f6f1c8a2dc0>, <gevent._gev arg=(<gevent._gevent_cqueue.Queue at remote 0x7f6f1c8a2dc0>, <gevent._gev
__pyx_pf_6gevent_17_gevent_cgreenlet_8Greenlet_42run (__pyx_v_self=0x7f6f1d1d __pyx_pf_6gevent_17_gevent_cgreenlet_8Greenlet_42run (__pyx_v_self=0x7f6f1d1d
at src/gevent/greenlet.c:14437 at src/gevent/greenlet.c:14437
__pyx_pw_6gevent_17_gevent_cgreenlet_8Greenlet_43run ( __pyx_pw_6gevent_17_gevent_cgreenlet_8Greenlet_43run (
__pyx_v_self=<gevent._gevent_cgreenlet.Greenlet at remote 0x7f6f1d1df370> __pyx_v_self=<gevent._gevent_cgreenlet.Greenlet at remote 0x7f6f1d1df370>
unused=<optimized out>) at src/gevent/greenlet.c:14338 unused=<optimized out>) at src/gevent/greenlet.c:14338
0x00007f6f3da89902 in method_vectorcall () from /lib64/libpython3.9.so.1.0 0x00007f6f3da89902 in method_vectorcall () from /lib64/libpython3.9.so.1.0
0x00007f6f3da32b3b in PyObject_Call () from /lib64/libpython3.9.so.1.0 0x00007f6f3da32b3b in PyObject_Call () from /lib64/libpython3.9.so.1.0
0x00007f6f3e3fcbc4 in g_initialstub (mark=<optimized out>) at src/greenlet/gr 0x00007f6f3e3fcbc4 in g_initialstub (mark=<optimized out>) at src/greenlet/gr
0x00007f6f3e3fce6f in g_switch (kwargs=<optimized out>, args=<optimized out>, 0x00007f6f3e3fce6f in g_switch (kwargs=<optimized out>, args=<optimized out>,
target=0x7f6f1d1df370) at src/greenlet/greenlet.c:692 target=0x7f6f1d1df370) at src/greenlet/greenlet.c:692
g_switch (target=<optimized out>, args=(), kwargs=0x0) at src/greenlet/greenl g_switch (target=<optimized out>, args=(), kwargs=0x0) at src/greenlet/greenl
0x00007f6f0f0bf900 in ?? () | 0x00007f6f0f062900 in ?? ()
0x00007f6f3e51e040 in ?? () 0x00007f6f3e51e040 in ?? ()
0x00005578990e19d0 in ?? () 0x00005578990e19d0 in ?? ()
0x00007f6f3da89410 in ?? () from /lib64/libpython3.9.so.1.0 0x00007f6f3da89410 in ?? () from /lib64/libpython3.9.so.1.0
0x0000000000000000 in ?? () 0x0000000000000000 in ?? ()
As you can see above, most of the entries are the same and would be merged together in a flamegraph output from py-spy... EXCEPT for the fact that the stack frame just below the g_switch changes addresses every so often, which results in essentially N independent flame graphs rather than one hierarchical flame graph. Does anyone have guidance for me on why this might happen (only on Python 3.9) and/or how to get py-spy to work with this?
I guess one hacky option would be to have it somehow ignore the line below the last g_switch, or stop at the first frame that has a ?? for the function name, which I can explore, but any other solutions are welcome!
I'm not a greenlet expert, but as I understand it, g_switch
is the "root" of the native stack for greenlets. I don't know how gdb decides "here's where the stack ends and I should stop unwinding". I assume that the way greenlet sets up the stack is different from how standard threads look like, and therefore, gdb doesn't know when to stop. So what you're seeing below g_switch
is basically garbage. Not 100% confident in this analysis though.
I think that the best solution here is your first suggestion - cut everything after g_switch
.