mpich icon indicating copy to clipboard operation
mpich copied to clipboard

Displaying source after attaching to an MPI job

Open mpichbot opened this issue 9 years ago • 0 comments

Originally by robl on 2015-09-28 08:34:15 -0500


(Not me: from a message to mpich-discuss from [email protected]:http://lists.mpich.org/pipermail/discuss/2015-September/004180.html )

This question came to me in my support role for TotalVIew. When you are debugging an MPI job, and attach after the job goes parallel, why don't you see the source for the main program? Instead you are usually located in the assembler code someplace, in some library above MPI_Init, and you have to go down the stack frame to see the source for your program.

There is a solution though. Way back when, when the MPIR_proctable interface was designed to help the debugger acquire processes automatically, there was a global variable defined MPIR_force_to_main. If this is defined in debugger.c as a global variable, then indeed, when you start up TotalView. Here's a diff of the change:

*** debugger.c~ 2015-07-22 17:59:51.000000000 -0400
--- debugger.c  2015-09-22 16:18:21.000000000 -0400
***************
*** 11,17 ****
  int MPIR_proctable_size = 0;
  int MPIR_i_am_starter = 0;
  int MPIR_partial_attach_ok = 0;
!
  volatile int MPIR_debug_state = 0;
  char *MPIR_debug_abort_string = 0;

--- 11,17 ----
  int MPIR_proctable_size = 0;
  int MPIR_i_am_starter = 0;
  int MPIR_partial_attach_ok = 0;
! int MPIR_force_to_main = 0;
  volatile int MPIR_debug_state = 0;
  char *MPIR_debug_abort_string = 0;

The MPI being discussed was Intel MPI, but I've always understood that to be based on MPICH, and I thought this would be a good place to start. I had the same discussion in the Open MPI group a few years ago, and they had dropped the variable at one point since it did not appear to be used in any way. The code certainly doesn't make use of it, but when the debugger starts looking for it, that's when the magic happens. I tried just defining the variable in debugger.h, but I think it got optimized away since I just set up

extern int MPIR_force_to_main;

and didn't assign a value at that point.

mpichbot avatar Oct 14 '16 19:10 mpichbot