EKAT icon indicating copy to clipboard operation
EKAT copied to clipboard

Potential problem with ExeSpaceUtils view_reduction and parallel_reduce

Open jgfouca opened this issue 2 years ago • 5 comments

Describe the bug This was discovered when porting shoc_energy_integrals to small kernels. I was getting large differences in the outputs of the view_reductions when num_threads>1. I suspect the problem is in the handling of the garbage of the last pack because the problem went away when I used nlev % pack_size = 0.

To Reproduce Steps to reproduce the behavior:

  1. Switch shoc_energy_integrals to the implementation it had before the small kernel PR. The one that uses view_reductions.
  2. Build SCREAM with -DSCREAM_SMALL_KERNELS=On -DCMAKE_BUILD_TYPE=Debug
  3. run OMP_NUM_THREADS=16 ./shoc_tests shoc_main_bfb
  4. This should fail due to being non_bfb with fortran. You can add print statements to confirm that the se_int, ke_int, wv_int, and wl_int values do not match fortran, which causes different results later in shoc for the output views.

Expected behavior view_reduction should have produced bfb results with fortran.

jgfouca avatar Sep 27 '22 16:09 jgfouca