pyinstrument
pyinstrument copied to clipboard
Profiling with multiprocessing
Is there any support for profiling multiple processes? Right now profiling code that uses multiprocessing gives me eg.
40.909 update_basketball_events_db db_updater/events.py:38
ββ 40.855 map multiprocessing/pool.py:261
ββ 40.855 get multiprocessing/pool.py:637
ββ 40.855 wait multiprocessing/pool.py:634
ββ 40.855 wait threading.py:533
ββ 40.855 wait threading.py:263
FWIW, joblib allows switching between multiprocessing
and threading
with a keyword argument. I changed a single call to joblib.Parallel(backend='threading', ...
to get a usable pyinstrument trace into some code that normally uses multiprocessing (of course I ran pyinstrument on a smaller problem, but that was more than sufficient for profiling purposes).
@ihnorton , could you give an example how to use joblib to profile a multi thread python code with pyinstrument?
I don't have a specific example handy, but the idea is that using backend=threading
prevents joblib from spawning a new subprocess. Such a process would not be visible to pyinstrument. Instead, with backend=threading
, all execution is kept in the main process, so pyinstrument's sampling can still happen (the GIL still applies to python code run with the threading
module, so pyinstrument still has observability there; if I remember right, pyinstrument doesn't look at the C stack, so if you are running C code that releases the GIL you have to use something else anyway).
I have the same problem, using joblib also does not give any visibility into time spent in individual functions in separate threads. @joerick, is this something not supported?
The following section from README led me to believe it would work
Pyinstrument uses a new profiling mode. Rather than using signals, pyinstrument uses a new statistical profiler built on PyEval_SetProfile. This means no more main thread restriction, no more IO errors when using Pyinstrument, and no need for a separate more 'setprofile' mode!
Now, I think it means that threads would not crash on 2.x and above
Any way to make this work ?
β ββ 0.401 retrieve joblib/parallel.py:893
β β ββ 0.401 get multiprocessing/pool.py:637
β β ββ 0.401 wait multiprocessing/pool.py:634
β β ββ 0.401 wait threading.py:533
β β ββ 0.401 wait threading.py:263
β β ββ 0.400 [self]
β ββ 0.005 _terminate_backend joblib/parallel.py:731
β ββ 0.005 terminate joblib/_parallel_backends.py:232
β ββ 0.005 terminate multiprocessing/pool.py:537
β ββ 0.005 __call__ multiprocessing/util.py:167
β ββ 0.005 _terminate_pool multiprocessing/pool.py:561
β ββ 0.005 join threading.py:1024
β ββ 0.005 _wait_for_tstate_lock threading.py:1062
FWIW, https://pypi.org/project/py-spy/ can help if you really need multiprocessing and need to look into subprocesses right now.
You can use the ProcessPoolExecutor
example from the Python docs as a simple repro for this issue (I've made the prime numbers bigger to make is_prime
show up more clearly in the profile):
import concurrent.futures
import math
PRIMES = [
1_000_000_000_100_011,
1_000_112_354_597_667,
1_003_229_774_283_941,
1_011_001_110_001_111,
1_084_051_191_974_761
]
def is_prime(n):
if n < 2:
return False
if n == 2:
return True
if n % 2 == 0:
return False
sqrt_n = int(math.floor(math.sqrt(n)))
for i in range(3, sqrt_n + 1, 2):
if n % i == 0:
return False
return True
def main():
with concurrent.futures.ProcessPoolExecutor(max_workers=2) as executor:
for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)):
print('%d is prime: %s' % (number, prime))
if __name__ == '__main__':
main()
Running pyinstrument doesn't show any time spent in is_prime
:
$ pyinstrument processpool_example.py
1000000000100011 is prime: True
1000112354597667 is prime: False
1003229774283941 is prime: True
1011001110001111 is prime: True
1084051191974761 is prime: True
_ ._ __/__ _ _ _ _ _/_ Recorded: 18:16:42 Samples: 29
/_//_/// /_\ / //_// / //_'/ // Duration: 3.418 CPU time: 0.030
/ _/ v3.4.2
Program: processpool_example.py
3.417 <module> <string>:1
[4 frames hidden] <string>, runpy
3.417 _run_code runpy.py:64
ββ 3.417 <module> processpool_example.py:3
ββ 3.412 main processpool_example.py:28
ββ 3.382 _chain_from_iterable_of_lists concurrent/futures/process.py:478
[6 frames hidden] concurrent, threading, <built-in>
3.382 lock.acquire <built-in>:0
As @madig mentioned, running the same code through py-spy
does surface the time spent in is_prime
:
$ py-spy record --subprocesses -o profile.svg -- python processpool_example.py