devito icon indicating copy to clipboard operation
devito copied to clipboard

`glb_to_rank` for large number of receivers

Open mloubout opened this issue 4 years ago • 3 comments

the glb_to_rank function in distributed is one of the remaining computational bottleneck for sparse objects coordinates distribution,

mloubout avatar Feb 05 '20 14:02 mloubout

Can you give more detail? Any profiling information?

ggorman avatar Feb 07 '20 08:02 ggorman

Regarding the slowdown we have with the thousands of receivers in the examples: Actually I spent some time on vtune cuz I was working on some clusters today but then I remembered that the bottleneck is on python-land. I attach 2 files here for running: mpirun -n 2 python3 -m cProfile -s time examples/seismic/acoustic/acoustic_example.py -d 500 500 50 --tn 10 on my local laptop (I will do better machines as well, though problem I think is obvious) (edited)

File cpu2.log was produced from running the default example: (TOP 5 by time)
         256889387 function calls (256374925 primitive calls) in 436.819 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  4000032   48.857    0.000  245.024    0.000 sparse.py:627(<genexpr>)
  5000411   40.648    0.000  170.159    0.000 data.py:401(_index_glb_to_loc)
  6000757   40.071    0.000  109.028    0.000 data.py:342(_normalize_index)
  5000390   32.184    0.000  327.336    0.000 data.py:189(__getitem__)
 12190115   31.820    0.000   39.914    0.000 utils.py:31(as_tuple)
         11366072 function calls (10851759 primitive calls) in 15.949 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        3    0.689    0.230    2.642    0.881 operator.py:583(apply)
     5958    0.513    0.000    0.516    0.000 {built-in method numpy.array}
       12    0.378    0.032    0.378    0.032 {method 'fill' of 'numpy.ndarray' objects}
982618/982233    0.338    0.000    0.415    0.000 {built-in method builtins.isinstance}
   313086    0.274    0.000    0.388    0.000 random.py:250(_randbelow_with_getrandbits)

georgebisbas avatar Mar 12 '21 20:03 georgebisbas

elastic_mpi_profile_rank0.pdf

from

DEVITO_LANGUAGE=openmp OMP_NUM_THREADS=8 tmpi 2 python benchmark.py run -P elastic -op forward -d 492 492 492 -so 12 --tn 50 --autotune off --dump-norms "/tmp/norms0.txt"

on hero (my workstation), on top of devito 976fda2a2

proposal: coordinates become immutable ; this is potentially invasive change. Alterantive: dirty flag to avoid recomputing _dist_datamap if coordinates haven't changed

FabioLuporini avatar Jun 25 '21 13:06 FabioLuporini