devito
devito copied to clipboard
`glb_to_rank` for large number of receivers
the glb_to_rank
function in distributed
is one of the remaining computational bottleneck for sparse objects coordinates distribution,
Can you give more detail? Any profiling information?
Regarding the slowdown we have with the thousands of receivers in the examples: Actually I spent some time on vtune cuz I was working on some clusters today but then I remembered that the bottleneck is on python-land. I attach 2 files here for running: mpirun -n 2 python3 -m cProfile -s time examples/seismic/acoustic/acoustic_example.py -d 500 500 50 --tn 10 on my local laptop (I will do better machines as well, though problem I think is obvious) (edited)
File cpu2.log was produced from running the default example: (TOP 5 by time)
256889387 function calls (256374925 primitive calls) in 436.819 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
4000032 48.857 0.000 245.024 0.000 sparse.py:627(<genexpr>)
5000411 40.648 0.000 170.159 0.000 data.py:401(_index_glb_to_loc)
6000757 40.071 0.000 109.028 0.000 data.py:342(_normalize_index)
5000390 32.184 0.000 327.336 0.000 data.py:189(__getitem__)
12190115 31.820 0.000 39.914 0.000 utils.py:31(as_tuple)
11366072 function calls (10851759 primitive calls) in 15.949 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
3 0.689 0.230 2.642 0.881 operator.py:583(apply)
5958 0.513 0.000 0.516 0.000 {built-in method numpy.array}
12 0.378 0.032 0.378 0.032 {method 'fill' of 'numpy.ndarray' objects}
982618/982233 0.338 0.000 0.415 0.000 {built-in method builtins.isinstance}
313086 0.274 0.000 0.388 0.000 random.py:250(_randbelow_with_getrandbits)
from
DEVITO_LANGUAGE=openmp OMP_NUM_THREADS=8 tmpi 2 python benchmark.py run -P elastic -op forward -d 492 492 492 -so 12 --tn 50 --autotune off --dump-norms "/tmp/norms0.txt"
on hero
(my workstation), on top of devito 976fda2a2
proposal: coordinates become immutable ; this is potentially invasive change. Alterantive: dirty flag to avoid recomputing _dist_datamap
if coordinates haven't changed