pyinstrument Feature Request: Attach & detach to a running process

I'd like to be able to run a long time process (for instance a server), attach to it, trace the calls, detach from it and get the frame timings.

Aug 20 '18 13:08 iddan

Cool idea. Could perhaps be possible by using https://github.com/lmacken/pyrasite?

Aug 22 '18 08:08 joerick

Yes, but it requires gdb which I think is the opposite of what this project aims for.

Aug 22 '18 14:08 iddan

For my use case a specific solution using https://github.com/Microsoft/ptvsd would do but if there's a more general way to do it I prefer so.

Aug 22 '18 14:08 iddan

true. I didn't actually manage to get pyrasite working on my mac just now. how are you imagining this working then? there's a dormant bit of pyinstrument that's already loaded into your server, and another process can message it to start/stop profiling and send output?

Aug 22 '18 14:08 joerick

That is definitely one option. I'd try to look for a standard way to attach to debug process and get back with my findings.

Aug 22 '18 14:08 iddan

I had similar idea, and here's how I would design it:

Program does pyinstrument.enable_background_mode("/tmp/profile/") on startup.
This installs a signal handler.

Later:

User sends SIGUSR2 to process. This does Profiler().start()
User waits a minute.
user sends SIGUSR2 to the process. This stops the profiler and dumps to /tmp/profile/<current timestamp>

Jul 27 '19 20:07 itamarst

Would you like the above as a PR? I could also do it as separate package.

Jul 27 '19 21:07 itamarst

I think it can be useful as part of this package

Jul 28 '19 09:07 iddan

Yes, I think this is a good idea! I'm a little wary of using signals for this, it's probably okay, and they'll get us the ability to drop a callback onto the main thread. But I guess profiling other threads would not be possible using this, or adding options to the profile request.

For that reason, and to make it a little more user friendly, I'd suggest changing the interface to this a little - instead of documenting the SIGUSR2 mechanism, we'd have a pyinstrument --attach <pid>, which would wait for ctrl-c and then print output. Internally, it would use the mechanism you describe. We'd also provide a default profile location so the program could just write pyinstrument.enable_background_mode().

By documenting pyinstrument --attach <pid> instead of SIGUSR2, we're not locked in and can change the underlying mechanism later, to add options like which thread to profile, or if we want to stream output.

Would you like the above as a PR?

Yes! Happy to:

get a PR for the mechanism you describe, and I can implement my interface on top
or, a PR for the whole lot would be amazing.

Jul 28 '19 14:07 joerick

Ah, threads :( It looks you can only inject exceptions in other threads with PyThreadState_SetAsyncExc, not arbitrary code? That's annoying. I wonder if there's another way...

Jul 28 '19 17:07 itamarst

Would main-thread-only be a deal-breaker for your use case?

Jul 28 '19 17:07 joerick

Just thinking about common use cases like threaded WSGI servers.

Jul 28 '19 17:07 itamarst

yeah..... it's a pain for that use case. I'm wondering if we could use threading.setprofile to cover that off? But that would require the profiling to be always active, even if nothing was listening. Sounds like a performance hit.

Jul 28 '19 17:07 joerick

If you did that on C level it would be C-level if statement on every traceable event (if !actively_profiling { return}. Which is... still a performance hit, though one wonders how much.

Jul 28 '19 17:07 itamarst

I've been looking around at other profilers, there's one called 'yappi' that can do this - it can make itself the profiler for all active threads, by editing a couple of fields on the PyThreadState object... check it out

Jul 28 '19 17:07 joerick

That makes me a little twitchy, but I guess if the setup API had strict Python version checks...

Jul 28 '19 17:07 itamarst

Original implementation is presumably based on https://github.com/python/cpython/blob/b9a0376b0dedf16a2f82fa43d851119d1f7a2707/Python/ceval.c#L4700

Jul 28 '19 17:07 itamarst

For sys.setprofile it says "The function is thread-specific, but there is no way for the profiler to know about context switches between threads, so it does not make sense to use this in the presence of multiple threads." Is that still the case, or can pyinstrument reasonably get meaningful results when installed in multiple threads?

Jul 28 '19 17:07 itamarst

The function is thread-specific, but there is no way for the profiler to know about context switches between threads, so it does not make sense to use this in the presence of multiple threads.

That confuses me, tbh. I guess it's partially true, in the sense that when looking at one thread's results you can't easily correlate which actions causes which CPU usage? But for the case of the multithreaded WSGI server, there aren't context switches that we care about because each request is handled on one thread only. Am I reading that right?

Original implementation is presumably based on https://github.com/python/cpython/blob/b9a0376b0dedf16a2f82fa43d851119d1f7a2707/Python/ceval.c#L4700

pyinstrument_cext uses PyEval_SetProfile, which is one function up in that file.

That makes me a little twitchy, but I guess if the setup API had strict Python version checks...

Yeah, it's a little hacky. But I'd expect if a new version of Python changes their implementation, to see that in CI failures. It could be a fun thing to implement! Starting place would be an all-threads version of pyinstrument_cext.setprofile.

As a feature/PR, it should probably be separated from the 'attach' functionality too.

Jul 28 '19 17:07 joerick

The issue with context switches on further thought is for tracing profilers only:

Function A in thread T1 STARTED
Context switch to T2
Later...
Context switch to T1
Function A in thread T1 finished

So now profiler thinks that function A took longer than it did. Whereas sampling profiler shouldn't have that issue, I think?

Anyway, for the feature, there's three parts:

API to install profiler in all threads.
enable_background_profiler() or some better name low-level implementation.
pyinstrument attach (which is a very good idea).

Do you want to do any of those? I can probably over time do all of them, might just take longer.

Jul 28 '19 23:07 itamarst

Keep in mind the common use case for servers is one thread locally, multiple threads deployed.

Jul 29 '19 06:07 iddan

https://github.com/benfred/py-spy does essentially this; you might see what method it uses.

Aug 16 '19 18:08 pjz

py-spy needs ptrace, so back to requiring root/privileged processes, which is what I'm trying to avoid.

Sep 20 '19 00:09 itamarst

A very simple solution that could go quite a long way would be to add a --max-duration=SECONDS option that runs the python program for the given amount of time (unless it quits by itself before), then halts and displays the instrumentation report. Then you can start a server with e.g. 2 minutes duration, apply your load and get the result.

Jan 10 '20 09:01 DanielJanzon

@DanielJanzon in that case, my normal pattern is to run the server with pyinstrument <file.py>, apply load, and then hit ctrl-c to kill it - pyinstrument prints the profile even if the program exits with failure.

Jan 10 '20 19:01 joerick

Ok, that didn't work for me due to other reasons (ctrl-c/SIGINT was not enough to stop the server). But I agree your proposed way is the way to go. Thanks.

Jan 16 '20 11:01 DanielJanzon