omniperf icon indicating copy to clipboard operation
omniperf copied to clipboard

Improve documentation for usage with multi-process runs

Open gsitaram opened this issue 2 years ago • 6 comments

Could some guidance be added in the documentation for using omniperf with MPI jobs? Should we collect profiles with omniperf for one rank only using a wrapper script that does so (see example of wrapper script below) and invoke it by mpirun <...> wrapper_omniperf.sh <...> <exe>? Or should we run omniperf <...> mpirun <...> <exe>? A sample wrapper script that I tried using is:

#! /usr/bin/env bash
if [[ -n ${OMPI_COMM_WORLD_RANK+z} ]]; then
  # mpich
  export MPI_RANK=${OMPI_COMM_WORLD_RANK}
elif [[ -n ${MV2_COMM_WORLD_RANK+z} ]]; then
  # ompi
  export MPI_RANK=${MV2_COMM_WORLD_RANK}
elif [[ -n ${SLURM_PROCID+z} ]]; then
    # mpich via srun
    export MPI_RANK=${SLURM_PROCID}
fi
if [[ ${MPI_RANK} == "0" ]]; then
  eval "omniperf profile -n <workload_name> -k <kernel_name> -b <ip_block> -- $*"
else
  "$*"
fi

It crashes when it (internally rocprof) tries to collect counters that are split in to multiple groups.

gsitaram avatar Nov 07 '22 18:11 gsitaram