loopy icon indicating copy to clipboard operation
loopy copied to clipboard

Poor scaling with many calls to add_prefetch

Open nchristensen opened this issue 1 year ago • 3 comments

Even with #755, attempting to prefetch many arrays scales poorly. By the 19th add_prefetch operation it takes around 5 seconds for add_prefetch to complete on one fused Mirgecom kernels with 100+ einsums. Profiling shows a lot of time is spent in get_grid_sizes_for_insn_ids_as_dicts.

add_prefetch time
0.10026049613952637
0.11436939239501953
0.13449859619140625
0.15656232833862305
0.18714141845703125
0.22336697578430176
0.2757580280303955
0.34452342987060547 
0.4413919448852539
0.5716948509216309 
0.7412521839141846
0.9670014381408691 
1.2657301425933838 
1.6620988845825195 
2.180263042449951 
2.8522701263427734 
3.696044683456421 
4.755332946777344 
6.091721773147583 
         452893 function calls (368789 primitive calls) in 8.962 seconds

   Ordered by: cumulative time
   List reduced from 557 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    8.962    8.962 __init__.py:779(prefetch_and_project)
        1    0.000    0.000    8.837    8.837 data.py:302(add_prefetch)
        1    0.000    0.000    8.837    8.837 data.py:153(add_prefetch_for_single_kernel)
        1    0.001    0.001    8.784    8.784 precompute.py:353(precompute_for_single_kernel)
2879/1595    0.003    0.000    8.018    0.005 __init__.py:752(wrapper)
      5/1    0.000    0.000    7.958    7.958 tools.py:800(assign_automatic_axes)
       44    0.002    0.000    7.890    0.179 __init__.py:802(get_iname_bounds)
        1    0.000    0.000    7.818    7.818 __init__.py:1031(get_grid_size_upper_bounds_as_exprs)
        1    0.000    0.000    7.818    7.818 __init__.py:990(get_grid_sizes_for_insn_ids_as_exprs)
        1    0.000    0.000    7.817    7.817 __init__.py:939(get_grid_sizes_for_insn_ids)
        1    0.002    0.002    7.817    7.817 __init__.py:845(get_grid_sizes_for_insn_ids_as_dicts)
     6948    7.439    0.001    7.482    0.001 __init__.py:925(wrapper)
       92    0.001    0.000    7.469    0.081 tools.py:352(op)
       46    0.000    0.000    3.740    0.081 tools.py:364(dim_min)
       46    0.038    0.001    3.729    0.081 tools.py:343(_get_dim_min)
       46    0.000    0.000    3.729    0.081 tools.py:370(dim_max)
       46    0.043    0.001    3.719    0.081 tools.py:339(_get_dim_max)
68309/37887    0.635    0.000    0.656    0.000 __init__.py:936(wrapper)
        1    0.001    0.001    0.512    0.512 array_buffer_map.py:196(__init__)
      154    0.002    0.000    0.479    0.003 __init__.py:1263(align_spaces)
       54    0.000    0.000    0.467    0.009 __init__.py:1312(align_two)
        1    0.000    0.000    0.467    0.467 array_buffer_map.py:173(compute_bounds)
        1    0.000    0.000    0.456    0.456 array_buffer_map.py:162(find_var_base_indices_and_shape_from_inames)
        1    0.000    0.000    0.456    0.456 array_buffer_map.py:165(<listcomp>)
        2    0.000    0.000    0.456    0.228 tools.py:379(base_index_and_length)
      462    0.069    0.000    0.434    0.001 __init__.py:1182(_align_dim_type)
       81    0.001    0.000    0.321    0.004 __init__.py:801(expr_like_add)
       84    0.313    0.004    0.320    0.004 __init__.py:769(_number_to_expr_like)
     6773    0.031    0.000    0.220    0.000 __init__.py:1178(_set_dim_id)
       49    0.000    0.000    0.215    0.004 __init__.py:1061(obj_project_out_except)

nchristensen avatar Feb 28 '23 03:02 nchristensen