PSyclone icon indicating copy to clipboard operation
PSyclone copied to clipboard

Fix errors in existing-code tutorials.

Open arporter opened this issue 1 year ago • 18 comments

A ticket to capture issues found when running the PSyclone training.

arporter avatar Jun 28 '24 10:06 arporter

  • Error in part 1. "sched" should be "psyir": image

arporter avatar Jun 28 '24 10:06 arporter

The small comment 'you can try to use apeg_dl_timer' is nearly impossible :( Here what I had to do to get it work:

  1. Download apeg_dl_timer and compile (easy enough)
  2. Compile the psy-data wrapper for dl_esm_inf:
DL_TIMER_ROOT=/home/joerg/work/apeg-dl_timer/  make
  1. Modify the Makefile to link in apeg_dl_timer, and point to the psydata wrapper:
joerg@Joerg-Surface:~/work/psyclone/tutorial/practicals/nemo/2_nemo_profiling$ git diff Makefile
diff --git a/tutorial/practicals/nemo/2_nemo_profiling/Makefile b/tutorial/practicals/nemo/2_nemo_profiling/Makefile
index 8cac6833d..1b3ed587f 100644
--- a/tutorial/practicals/nemo/2_nemo_profiling/Makefile
+++ b/tutorial/practicals/nemo/2_nemo_profiling/Makefile
@@ -45,6 +45,11 @@ PROFILE_DIR ?= ../../../../lib/profiling/simple_timing
 PROFILE_LINK = -lsimple_timing
 PROFILE_LIB = ${PROFILE_DIR}/libsimple_timing.a
 
+PROFILE_DIR ?= ./../../../../lib/profiling/dl_timer/
+PROFILE_LINK = -ldl_timer_psy
+PROFILE_LIB = ${PROFILE_DIR}/libdl_timer_psy.a
+DL_TIMER_LIB = ~/work/apeg-dl_timer/libdl_timer_omp.a
+
 NAME = ./tra_adv.exe
 
 # Use gfortran by default
@@ -84,8 +89,8 @@ transform:
                      -o output_3.f90 -l output tra_adv_mod.F90
 
 compile: transform $(KERNELS) output.o solutions/runner.o
-       $(F90) $(KERNELS) output.o solutions/runner.o -o $(NAME) \
-              -L$(PROFILE_DIR) $(PROFILE_LINK)
+       $(F90) $(F90FLAGS) $(KERNELS) output.o solutions/runner.o -o $(NAME) \
+              -L$(PROFILE_DIR) $(PROFILE_LINK) $(DL_TIMER_LIB)
 
 # Only used for the compile CI target to compile the solution file
 solutions/%.o: solutions/%.f90

Compile with additional F90FLAGS (which were added to the link line in the previous change):

make F90FLAGS=-fopenmp  run -j 2

OpenMP is required if apeg_dl_timer is compiled with openmp, even if no openmp is applied by a transformation

My suggested solution: we add a second Makefile that has all the changes applied. The two cases of simple_timing and dl_timer are quite different, since simple_timer does not need to link in an existing library :(

hiker avatar Jun 28 '24 12:06 hiker

Can't you change PROFILE_LINK?

If I build dl_timer in MPI mode and then set:

PROFILE_DIR ?= ../../../../lib/profiling/dl_timer
PROFILE_LIB = ${PROFILE_DIR}/libdl_timer_psy.a
PROFILE_LINK = ${PROFILE_LIB} ~/Projects/dl_timer/libdl_timer_mpi.a

and export F90=mpif90 then it builds for me. We need to add comments/and or a new variable that holds the location of the profiling library as well as the wrapper library.

arporter avatar Jun 28 '24 12:06 arporter

PROFILE_LINK is for the psydata wrapper. Well, I guess you could also add the options for the actual timer lib.

hiker avatar Jun 28 '24 12:06 hiker

Error in tutorial 3: image (note use of child in loop body)

arporter avatar Jun 28 '24 12:06 arporter

3.1: OMPParallelLoopTrans (probably ParallelLoopTrans) validate just says there's a CodeBlock. It would be better if it printed the contents of the CodeBlock.

arporter avatar Jun 28 '24 13:06 arporter

3_nemo_omp/omp_trans.py already has a walk in it but the tutorial instructions say to add one: image I guess originally we just had a loop over children? And the text above this is wrong too (last para): image

arporter avatar Jun 28 '24 13:06 arporter

Besides shutdown, the runner MUST also call the init function for dl_timer. I.e.:

program runner
  use tra_adv_mod, only: tra_adv
  use profile_psy_data_mod, only: profile_PSyDataInit, profile_psydatashutdown

  call profile_PSyDataInit()
  call tra_adv()

  call profile_psydatashutdown()

end program runner

https://bitbucket.org/apeg/dl_timer/issues/14/check-that-init-has-been-called

hiker avatar Jun 28 '24 13:06 hiker

btw, now that the psyir starts at root and is easier to handle multiple routines easily we could split the tra_adv into multiple routines. This would allow the default "--profile routines" to be more interesting, and revert to a single file where we could insert the PSy_DataShutdown programmatically by searching only the main function. Also show InlineTrans, ACC/OMP routine/declare, HoistLocalArraysTrans, ...

sergisiso avatar Jun 28 '24 13:06 sergisiso

btw, now that the psyir starts at root and is easier to handle multiple routines easily we could split the tra_adv into multiple routines. This would allow the default "--profile routines" to be more interesting, and revert to a single file where we could insert the PSy_DataShutdown programmatically by searching only the main function. Also show InlineTrans, ACC/OMP routine/declare, HoistLocalArraysTrans, ...

Oh yes, nice. I'll let you write that bit ;-) I think that would have to be an 'advanced' course...

arporter avatar Jun 28 '24 14:06 arporter

Difficulties validating the OpenACC Kernels version of the tra_adv miniapp on NVIDIA GPU - does this work on Glados?

arporter avatar Jun 28 '24 14:06 arporter

Maybe we could add the running and results comparison part of the tutorial to the integration tests.

sergisiso avatar Jun 28 '24 14:06 sergisiso

The divergent results for example 4 seems to occur here:

This is where the first difference starts:

      !$acc kernels
      do jk = 1, jpk - 1, 1
        zdt = 1
        do jj = 2, jpj - 1, 1
          do ji = 2, jpi - 1, 1
            z0u = SIGN(0.5d0, pun(ji,jj,jk))
            zalpha = 0.5d0 - z0u
            zu = z0u - 0.5d0 * pun(ji,jj,jk) * zdt
            zzwx = mydomain(ji + 1,jj,jk) + zind(ji,jj,jk) * (zu * zslpx(ji + 1,jj,jk))
            zzwy = mydomain(ji,jj,jk) + zind(ji,jj,jk) * (zu * zslpx(ji,jj,jk))
            zwx(ji,jj,jk) = pun(ji,jj,jk) * (zalpha * zzwx + (1. - zalpha) * zzwy)
            z0v = SIGN(0.5d0, pvn(ji,jj,jk))
            zalpha = 0.5d0 - z0v
            zv = z0v - 0.5d0 * pvn(ji,jj,jk) * zdt
            zzwx = mydomain(ji,jj + 1,jk) + zind(ji,jj,jk) * (zv * zslpy(ji,jj + 1,jk))
            zzwy = mydomain(ji,jj,jk) + zind(ji,jj,jk) * (zv * zslpy(ji,jj,jk))
            zwy(ji,jj,jk) = pvn(ji,jj,jk) * (zalpha * zzwx + (1.d0 - zalpha) * zzwy)
          enddo
        enddo
      enddo
      !$acc end kernels

I've not tested it here but if it diverges it probably is here for them. Its not clear to me why - there's no ambiguity of operations and no race and no division or other complex instructions (eg sqrt) that should cause it to diverge I think

LonelyCat124 avatar Jun 28 '24 14:06 LonelyCat124

There is an assignment to a scalar (zdt) which rings a bell for me.

arporter avatar Jun 28 '24 14:06 arporter

Expose ability to run backend checks to user/script?

arporter avatar Jun 28 '24 14:06 arporter

I think in theory it is accesible - validate_global_constraints (and public) is there but you'd have to track down the node in some way (and we no longer return the result of transformations)

LonelyCat124 avatar Jun 28 '24 14:06 LonelyCat124

4.4 neglects the need to create a mapping when creating a new script: image

arporter avatar Jun 28 '24 15:06 arporter

There's another error in the same section: image We should do a psyir.walk(Loop) rather than look at children of subroutine.

arporter avatar Jun 28 '24 15:06 arporter

Difficulties validating the OpenACC Kernels version of the tra_adv miniapp on NVIDIA GPU - does this work on Glados?

For the record, we had to apply the -Mnofma flag on both the CPU and GPU builds with the NVIDIA compiler. (From memory, I think that it's the CPU that's the problem.) Anyway, I've updated the README.md of part 4 of the tutorial to mention this.

arporter avatar Jul 10 '24 13:07 arporter

The one remaining piece is now to see whether we can run part 4 of the tutorial on GPU as part of the integration tests.

arporter avatar Jul 10 '24 13:07 arporter