traces include time spent in garbage collector
Often, when I record a trace, it will contain one or two lines with very large (and not entirely believable) sample counts. I suspect this is because the garbage collector is kicking in.
Whilst it's in theory useful to know that the garbage collector is using CPU time, it's not terribly helpful from the point of view of looking for parts of my app I can optimise.
Would it be possible to detect that the GC is active, and omit such samples from the report?
If it's GC time you might be able to see it by enabling native stack traces - we include certain python functions like GC in the stack traces.
Do you have a script that reproduces? Python objects are ref-counted and the GC will only kick in if you have cyclic references to objects. I'm wondering if this is caused by a different problem like GIL contention instead of GC - unless you're allocating a ton of objects with cyclic references there shouldn't be a large GC pause.
Also worth checking out https://twitter.com/tmr232/status/1387049432319873035 =)
yeah, I'm certainly not ruling out user error here :)
I've got a long-running application, and one reason to suspect GC is that, each time I profile it, I see it getting "stuck" at a completely different point in the code.
-n doesn't seem to show anything particularly interesting.
Another datapoint: I'm running with -g (since my application uses threads and if I don't use -g I get hundreds of samples from threads waiting for a semaphore). I'm not sure what bearing that might have on whether I might expect to see GC time in the reports.
hmm - that does sound suspicious.
Can you try disabling GC altogether and see if the problem still happens? Instagram does this https://instagram-engineering.com/dismissing-python-garbage-collection-at-instagram-4dca40b29172 and found some benefits.
I'm wondering if you call something like
import gc
gc.disable()
gc.set_threshold(0)
and then you don't get stuck at any point you can say with some confidence that its GC related.
yep. Of course, doing so requires restarting the process, which changes things somewhat. Still, I've tried it, and it certainly looks like the results are more consistent with what I would expect.
it also makes my memory usage do this, so I think the answer to "unless you're allocating a ton of objects with cyclic references" is "yes":

yup - that definitely looks like it is GC issues.
I don't quite know how to figure out if the target process is in a GC cycle right now - this might be pretty tricky to implement =(
I don't quite know how to figure out if the target process is in a GC cycle right now - this might be pretty tricky to implement =(
This might be naive, but you already inspect the _PyRuntime to see if a thread has the GIL. There is likewise a _PyRuntime.gc.collecting which is set to 1 while the GC is running (https://github.com/python/cpython/blob/bb3e0c240bc60fe08d332ff5955d54197f79751c/Modules/gcmodule.c#L2266 etc), which might serve the purpose here?
There is a prototype of this at #515 - this will display the percentage of samples in the record command spent in GC, and also has an option to exclude samples in GC from the record output.
Note that this currently on workings on x86_64 linux/mac , with python 3.9+ and doesn't work for profiling subprocesses. If you want to try it out, you can download the wheels from the github actions artifacts here https://github.com/benfred/py-spy/actions/runs/2794391704