PSyclone Fix errors in existing-code tutorials.

A ticket to capture issues found when running the PSyclone training.

Jun 28 '24 10:06 arporter

Error in part 1. "sched" should be "psyir":

Jun 28 '24 10:06 arporter

The small comment 'you can try to use apeg_dl_timer' is nearly impossible :( Here what I had to do to get it work:

Download apeg_dl_timer and compile (easy enough)
Compile the psy-data wrapper for dl_esm_inf:

DL_TIMER_ROOT=/home/joerg/work/apeg-dl_timer/  make

Modify the Makefile to link in apeg_dl_timer, and point to the psydata wrapper:

joerg@Joerg-Surface:~/work/psyclone/tutorial/practicals/nemo/2_nemo_profiling$ git diff Makefile
diff --git a/tutorial/practicals/nemo/2_nemo_profiling/Makefile b/tutorial/practicals/nemo/2_nemo_profiling/Makefile
index 8cac6833d..1b3ed587f 100644
--- a/tutorial/practicals/nemo/2_nemo_profiling/Makefile
+++ b/tutorial/practicals/nemo/2_nemo_profiling/Makefile
@@ -45,6 +45,11 @@ PROFILE_DIR ?= ../../../../lib/profiling/simple_timing
 PROFILE_LINK = -lsimple_timing
 PROFILE_LIB = ${PROFILE_DIR}/libsimple_timing.a
 
+PROFILE_DIR ?= ./../../../../lib/profiling/dl_timer/
+PROFILE_LINK = -ldl_timer_psy
+PROFILE_LIB = ${PROFILE_DIR}/libdl_timer_psy.a
+DL_TIMER_LIB = ~/work/apeg-dl_timer/libdl_timer_omp.a
+
 NAME = ./tra_adv.exe
 
 # Use gfortran by default
@@ -84,8 +89,8 @@ transform:
                      -o output_3.f90 -l output tra_adv_mod.F90
 
 compile: transform $(KERNELS) output.o solutions/runner.o
-       $(F90) $(KERNELS) output.o solutions/runner.o -o $(NAME) \
-              -L$(PROFILE_DIR) $(PROFILE_LINK)
+       $(F90) $(F90FLAGS) $(KERNELS) output.o solutions/runner.o -o $(NAME) \
+              -L$(PROFILE_DIR) $(PROFILE_LINK) $(DL_TIMER_LIB)
 
 # Only used for the compile CI target to compile the solution file
 solutions/%.o: solutions/%.f90

Compile with additional F90FLAGS (which were added to the link line in the previous change):

make F90FLAGS=-fopenmp  run -j 2

OpenMP is required if apeg_dl_timer is compiled with openmp, even if no openmp is applied by a transformation

My suggested solution: we add a second Makefile that has all the changes applied. The two cases of simple_timing and dl_timer are quite different, since simple_timer does not need to link in an existing library :(

Jun 28 '24 12:06 hiker

Can't you change PROFILE_LINK?

If I build dl_timer in MPI mode and then set:

PROFILE_DIR ?= ../../../../lib/profiling/dl_timer
PROFILE_LIB = ${PROFILE_DIR}/libdl_timer_psy.a
PROFILE_LINK = ${PROFILE_LIB} ~/Projects/dl_timer/libdl_timer_mpi.a

and export F90=mpif90 then it builds for me. We need to add comments/and or a new variable that holds the location of the profiling library as well as the wrapper library.

Jun 28 '24 12:06 arporter

PROFILE_LINK is for the psydata wrapper. Well, I guess you could also add the options for the actual timer lib.

Jun 28 '24 12:06 hiker

Error in tutorial 3: (note use of child in loop body)

Jun 28 '24 12:06 arporter

3.1: OMPParallelLoopTrans (probably ParallelLoopTrans) validate just says there's a CodeBlock. It would be better if it printed the contents of the CodeBlock.

Jun 28 '24 13:06 arporter

3_nemo_omp/omp_trans.py already has a walk in it but the tutorial instructions say to add one: I guess originally we just had a loop over children? And the text above this is wrong too (last para):

Jun 28 '24 13:06 arporter

Besides shutdown, the runner MUST also call the init function for dl_timer. I.e.:

program runner
  use tra_adv_mod, only: tra_adv
  use profile_psy_data_mod, only: profile_PSyDataInit, profile_psydatashutdown

  call profile_PSyDataInit()
  call tra_adv()

  call profile_psydatashutdown()

end program runner

https://bitbucket.org/apeg/dl_timer/issues/14/check-that-init-has-been-called

Jun 28 '24 13:06 hiker

btw, now that the psyir starts at root and is easier to handle multiple routines easily we could split the tra_adv into multiple routines. This would allow the default "--profile routines" to be more interesting, and revert to a single file where we could insert the PSy_DataShutdown programmatically by searching only the main function. Also show InlineTrans, ACC/OMP routine/declare, HoistLocalArraysTrans, ...

Jun 28 '24 13:06 sergisiso

btw, now that the psyir starts at root and is easier to handle multiple routines easily we could split the tra_adv into multiple routines. This would allow the default "--profile routines" to be more interesting, and revert to a single file where we could insert the PSy_DataShutdown programmatically by searching only the main function. Also show InlineTrans, ACC/OMP routine/declare, HoistLocalArraysTrans, ...

Oh yes, nice. I'll let you write that bit ;-) I think that would have to be an 'advanced' course...

Jun 28 '24 14:06 arporter

Difficulties validating the OpenACC Kernels version of the tra_adv miniapp on NVIDIA GPU - does this work on Glados?

Jun 28 '24 14:06 arporter

Maybe we could add the running and results comparison part of the tutorial to the integration tests.

Jun 28 '24 14:06 sergisiso

The divergent results for example 4 seems to occur here:

This is where the first difference starts:

      !$acc kernels
      do jk = 1, jpk - 1, 1
        zdt = 1
        do jj = 2, jpj - 1, 1
          do ji = 2, jpi - 1, 1
            z0u = SIGN(0.5d0, pun(ji,jj,jk))
            zalpha = 0.5d0 - z0u
            zu = z0u - 0.5d0 * pun(ji,jj,jk) * zdt
            zzwx = mydomain(ji + 1,jj,jk) + zind(ji,jj,jk) * (zu * zslpx(ji + 1,jj,jk))
            zzwy = mydomain(ji,jj,jk) + zind(ji,jj,jk) * (zu * zslpx(ji,jj,jk))
            zwx(ji,jj,jk) = pun(ji,jj,jk) * (zalpha * zzwx + (1. - zalpha) * zzwy)
            z0v = SIGN(0.5d0, pvn(ji,jj,jk))
            zalpha = 0.5d0 - z0v
            zv = z0v - 0.5d0 * pvn(ji,jj,jk) * zdt
            zzwx = mydomain(ji,jj + 1,jk) + zind(ji,jj,jk) * (zv * zslpy(ji,jj + 1,jk))
            zzwy = mydomain(ji,jj,jk) + zind(ji,jj,jk) * (zv * zslpy(ji,jj,jk))
            zwy(ji,jj,jk) = pvn(ji,jj,jk) * (zalpha * zzwx + (1.d0 - zalpha) * zzwy)
          enddo
        enddo
      enddo
      !$acc end kernels

I've not tested it here but if it diverges it probably is here for them. Its not clear to me why - there's no ambiguity of operations and no race and no division or other complex instructions (eg sqrt) that should cause it to diverge I think

Jun 28 '24 14:06 LonelyCat124

There is an assignment to a scalar (zdt) which rings a bell for me.

Jun 28 '24 14:06 arporter

Expose ability to run backend checks to user/script?

Jun 28 '24 14:06 arporter

I think in theory it is accesible - validate_global_constraints (and public) is there but you'd have to track down the node in some way (and we no longer return the result of transformations)

Jun 28 '24 14:06 LonelyCat124

4.4 neglects the need to create a mapping when creating a new script:

Jun 28 '24 15:06 arporter

There's another error in the same section: We should do a psyir.walk(Loop) rather than look at children of subroutine.

Jun 28 '24 15:06 arporter

Difficulties validating the OpenACC Kernels version of the tra_adv miniapp on NVIDIA GPU - does this work on Glados?

For the record, we had to apply the -Mnofma flag on both the CPU and GPU builds with the NVIDIA compiler. (From memory, I think that it's the CPU that's the problem.) Anyway, I've updated the README.md of part 4 of the tutorial to mention this.

Jul 10 '24 13:07 arporter

The one remaining piece is now to see whether we can run part 4 of the tutorial on GPU as part of the integration tests.

Jul 10 '24 13:07 arporter