omniperf
omniperf copied to clipboard
Improve documentation for usage with multi-process runs
Could some guidance be added in the documentation for using omniperf with MPI jobs? Should we collect profiles with omniperf for one rank only using a wrapper script that does so (see example of wrapper script below) and invoke it by mpirun <...> wrapper_omniperf.sh <...> <exe>
? Or should we run omniperf <...> mpirun <...> <exe>
?
A sample wrapper script that I tried using is:
#! /usr/bin/env bash
if [[ -n ${OMPI_COMM_WORLD_RANK+z} ]]; then
# mpich
export MPI_RANK=${OMPI_COMM_WORLD_RANK}
elif [[ -n ${MV2_COMM_WORLD_RANK+z} ]]; then
# ompi
export MPI_RANK=${MV2_COMM_WORLD_RANK}
elif [[ -n ${SLURM_PROCID+z} ]]; then
# mpich via srun
export MPI_RANK=${SLURM_PROCID}
fi
if [[ ${MPI_RANK} == "0" ]]; then
eval "omniperf profile -n <workload_name> -k <kernel_name> -b <ip_block> -- $*"
else
"$*"
fi
It crashes when it (internally rocprof) tries to collect counters that are split in to multiple groups.