Feature Request: Attach & detach to a running process
I'd like to be able to run a long time process (for instance a server), attach to it, trace the calls, detach from it and get the frame timings.
Cool idea. Could perhaps be possible by using https://github.com/lmacken/pyrasite?
Yes, but it requires gdb which I think is the opposite of what this project aims for.
For my use case a specific solution using https://github.com/Microsoft/ptvsd would do but if there's a more general way to do it I prefer so.
true. I didn't actually manage to get pyrasite working on my mac just now. how are you imagining this working then? there's a dormant bit of pyinstrument that's already loaded into your server, and another process can message it to start/stop profiling and send output?
That is definitely one option. I'd try to look for a standard way to attach to debug process and get back with my findings.
I had similar idea, and here's how I would design it:
- Program does
pyinstrument.enable_background_mode("/tmp/profile/")on startup. - This installs a signal handler.
Later:
- User sends SIGUSR2 to process. This does
Profiler().start() - User waits a minute.
- user sends SIGUSR2 to the process. This stops the profiler and dumps to
/tmp/profile/<current timestamp>
Would you like the above as a PR? I could also do it as separate package.
I think it can be useful as part of this package
Yes, I think this is a good idea! I'm a little wary of using signals for this, it's probably okay, and they'll get us the ability to drop a callback onto the main thread. But I guess profiling other threads would not be possible using this, or adding options to the profile request.
For that reason, and to make it a little more user friendly, I'd suggest changing the interface to this a little - instead of documenting the SIGUSR2 mechanism, we'd have a pyinstrument --attach <pid>, which would wait for ctrl-c and then print output. Internally, it would use the mechanism you describe. We'd also provide a default profile location so the program could just write pyinstrument.enable_background_mode().
By documenting pyinstrument --attach <pid> instead of SIGUSR2, we're not locked in and can change the underlying mechanism later, to add options like which thread to profile, or if we want to stream output.
Would you like the above as a PR?
Yes! Happy to:
- get a PR for the mechanism you describe, and I can implement my interface on top
- or, a PR for the whole lot would be amazing.
Ah, threads :( It looks you can only inject exceptions in other threads with PyThreadState_SetAsyncExc, not arbitrary code? That's annoying. I wonder if there's another way...
Would main-thread-only be a deal-breaker for your use case?
Just thinking about common use cases like threaded WSGI servers.
yeah..... it's a pain for that use case. I'm wondering if we could use threading.setprofile to cover that off? But that would require the profiling to be always active, even if nothing was listening. Sounds like a performance hit.
If you did that on C level it would be C-level if statement on every traceable event (if !actively_profiling { return}. Which is... still a performance hit, though one wonders how much.
I've been looking around at other profilers, there's one called 'yappi' that can do this - it can make itself the profiler for all active threads, by editing a couple of fields on the PyThreadState object... check it out
That makes me a little twitchy, but I guess if the setup API had strict Python version checks...
Original implementation is presumably based on https://github.com/python/cpython/blob/b9a0376b0dedf16a2f82fa43d851119d1f7a2707/Python/ceval.c#L4700
For sys.setprofile it says "The function is thread-specific, but there is no way for the profiler to know about context switches between threads, so it does not make sense to use this in the presence of multiple threads." Is that still the case, or can pyinstrument reasonably get meaningful results when installed in multiple threads?
The function is thread-specific, but there is no way for the profiler to know about context switches between threads, so it does not make sense to use this in the presence of multiple threads.
That confuses me, tbh. I guess it's partially true, in the sense that when looking at one thread's results you can't easily correlate which actions causes which CPU usage? But for the case of the multithreaded WSGI server, there aren't context switches that we care about because each request is handled on one thread only. Am I reading that right?
Original implementation is presumably based on https://github.com/python/cpython/blob/b9a0376b0dedf16a2f82fa43d851119d1f7a2707/Python/ceval.c#L4700
pyinstrument_cext uses PyEval_SetProfile, which is one function up in that file.
That makes me a little twitchy, but I guess if the setup API had strict Python version checks...
Yeah, it's a little hacky. But I'd expect if a new version of Python changes their implementation, to see that in CI failures. It could be a fun thing to implement! Starting place would be an all-threads version of pyinstrument_cext.setprofile.
As a feature/PR, it should probably be separated from the 'attach' functionality too.
The issue with context switches on further thought is for tracing profilers only:
- Function A in thread T1 STARTED
- Context switch to T2
- Later...
- Context switch to T1
- Function A in thread T1 finished
So now profiler thinks that function A took longer than it did. Whereas sampling profiler shouldn't have that issue, I think?
Anyway, for the feature, there's three parts:
- API to install profiler in all threads.
-
enable_background_profiler()or some better name low-level implementation. -
pyinstrument attach(which is a very good idea).
Do you want to do any of those? I can probably over time do all of them, might just take longer.
Keep in mind the common use case for servers is one thread locally, multiple threads deployed.
https://github.com/benfred/py-spy does essentially this; you might see what method it uses.
py-spy needs ptrace, so back to requiring root/privileged processes, which is what I'm trying to avoid.
A very simple solution that could go quite a long way would be to add a --max-duration=SECONDS option that runs the python program for the given amount of time (unless it quits by itself before), then halts and displays the instrumentation report. Then you can start a server with e.g. 2 minutes duration, apply your load and get the result.
@DanielJanzon in that case, my normal pattern is to run the server with pyinstrument <file.py>, apply load, and then hit ctrl-c to kill it - pyinstrument prints the profile even if the program exits with failure.
Ok, that didn't work for me due to other reasons (ctrl-c/SIGINT was not enough to stop the server). But I agree your proposed way is the way to go. Thanks.