pti-gpu icon indicating copy to clipboard operation
pti-gpu copied to clipboard

unitrace crashes when using mpiexec

Open flezaalv opened this issue 1 year ago • 2 comments

I launched unitrace in a mpiexec command:

mpiexec -n 12 -ppn 12 --pmi=pmix ~/pti-gpu/tools/unitrace/build/unitrace --separate-tiles --chrome-device-logging --ccl-summary-report --output-dir-path /home/test --output /home/test/test.csv python bin/sr.py

This is executed in a single node, 12 processes are created, but when they finishes I got this error from one process and the entire mpiexec fails:

hostname: rank 0 died from signal 15

I got this error in unitrace too https://github.com/intel/pti-gpu/issues/25, is this error the cause of signal 15?

flezaalv avatar Jun 19 '24 16:06 flezaalv

  • Does app pass without Unitrace?
  • Does it fail even with smaller number of ranks?
  • Can you share the app and other details and help me to reproduce the issue locally?

Sarbojit2019 avatar Jun 21 '24 06:06 Sarbojit2019

  • Does app pass without Unitrace? Yes, it does, the app without Unitrace finishes with 0 return status.

  • Does it fail even with smaller number of ranks? I tested with mpiexec -n 2 -ppn 2 and get this error:

/run_mpi.sh: line 7: 169430 Segmentation fault      (core dumped) python bin/sr.py
[INFO] Log is stored in /home/test10/results.169391.0.csv
[INFO] Timeline is stored in /home/test10/run_mpi.sh.169391.0.json
hostname: rank 0 exited with code 139
hostname: rank 1 died from signal 15

The run_mpi.sh contains the entire app command. This is the mpiexec instruction with unitrace included:

mpiexec -n 2 -ppn 2 ~/pti-gpu/tools/unitrace/build/unitrace --separate-tiles --chrome-device-logging --ccl-summary-report --output-dir-path /home/test10/ --output /home/test10/results.csv ./run_mpi.sh

Sure, I will share you more details.

Thanks!

flezaalv avatar Jun 26 '24 17:06 flezaalv

@flezaalv With the latest version, do you still have this issue? Can this issue be closed?

zma2 avatar Jul 16 '25 02:07 zma2

Yes, It can be closed.

flezaalv avatar Jul 17 '25 22:07 flezaalv