mpich icon indicating copy to clipboard operation
mpich copied to clipboard

Launch non-MPI processes with MPI_Comm_spawn

Open kmccall882 opened this issue 6 years ago • 9 comments

MPI_Comm_spawn() currently doesn't have a way to launch a non-MPI process, one that will never call MPI_Init(). This would be essential when one needs to use MPI to manage programs that call system() or fork(), since those functions may cause undefined behavior in MPI.

Suggest adding a flag to the MPI_Info argument to MPI_Comm_spawn().

kmccall882 avatar Jan 07 '20 16:01 kmccall882

@kmccall882 , Have you tried system() or fork()? I think it will probably work.

hzhou avatar May 04 '21 13:05 hzhou

This issue was discussed some months ago, and I don't remember the context. I've been successful at using MPI_Comm_spawn() to create processes. I'm pretty sure that I read that system() and fork() can cause undefined behavior in MPI.

Kurt

kmccall882 avatar May 04 '21 14:05 kmccall882

I've been successful at using MPI_Comm_spawn() to create processes.

MPI_Comm_spawn need return an inter communicator. How would that be possible if the spawned processes include non-MPI processes?

I'm pretty sure that I read that system() and fork() can cause undefined behavior in MPI.

Since processes launched via system() or fork() are not properly registered with the process manager, it may mess up the process manager's handling of IO capturing or signal handling. But I suspect for "normal" operations, that the launching (MPI) process handles the reaping of fork-ed children cleanly, it may just work. I am inclined to see what issues may arise with direct system and fork and see if we can amend it with process manager.

hzhou avatar May 04 '21 15:05 hzhou

  • MPI_Comm_spawn needs to return an inter communicator. How would that be possible if the spawned processes include non-MPI processes? Sorry Hui, I didn't know the context of the question. It wasn't included in the message I received. Thanks for the info about system() and fork().

Kurt

From: Hui Zhou @.> Sent: Tuesday, May 4, 2021 10:33 AM To: pmodels/mpich @.> Cc: Mccall, Kurt E. (MSFC-EV41) @.>; Mention @.> Subject: [EXTERNAL] Re: [pmodels/mpich] Launch non-MPI processes with MPI_Comm_spawn (#4252)

I've been successful at using MPI_Comm_spawn() to create processes.

MPI_Comm_spawn need return an inter communicator. How would that be possible if the spawned processes include non-MPI processes?

I'm pretty sure that I read that system() and fork() can cause undefined behavior in MPI.

Since processes launched via system() or fork() are not properly registered with the process manager, it may mess up the process manager's handling of IO capturing or signal handling. But I suspect for "normal" operations, that the launching (MPI) process handles the reaping of fork-ed children cleanly, it may just work. I am inclined to see what issues may arise with direct system and fork and see if we can amend it with process manager.

You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpmodels%2Fmpich%2Fissues%2F4252%23issuecomment-832034721&data=04%7C01%7Ckurt.e.mccall%40nasa.gov%7C425852d6c3f949ad076b08d90f11f2b7%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C637557392027938737%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=NahoquHTZ1V3squf%2BluE%2F9CwURpIix4p0wwR57IqVKk%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACLN75P462IBSVZ6BPNCNULTMAHTZANCNFSM4KD3M34A&data=04%7C01%7Ckurt.e.mccall%40nasa.gov%7C425852d6c3f949ad076b08d90f11f2b7%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C637557392027948691%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=tDEnIJDfoiEM2uHdnknmJL%2Bd5x0SFUR%2Bnjl6tWpK76A%3D&reserved=0.

kmccall882 avatar May 04 '21 15:05 kmccall882

MPI_Comm_spawn need return an inter communicator. How would that be possible if the spawned processes include non-MPI processes?

I believe OpenMPI added a non-standard hint to support non-MPI processes. That is, do not wait for intercomm creation.

Since processes launched via system() or fork() are not properly registered with the process manager, it may mess up the process manager's handling of IO capturing or signal handling. But I suspect for "normal" operations, that the launching (MPI) process handles the reaping of fork-ed children cleanly, it may just work. I am inclined to see what issues may arise with direct system and fork and see if we can amend it with process manager.

Some libraries, like libibverbs, are not fork safe in all instances.

raffenet avatar May 04 '21 15:05 raffenet

Some libraries, like libibverbs, are not fork safe in all instances.

Some links about libibverbs and fork support: https://www.rdmamojo.com/2012/05/18/libibverbs/#Fork_safe https://www.rdmamojo.com/2012/05/24/ibv_fork_init/

raffenet avatar May 04 '21 16:05 raffenet

Some libraries, like libibverbs, are not fork safe in all instances.

Some links about libibverbs and fork support: https://www.rdmamojo.com/2012/05/18/libibverbs/#Fork_safe https://www.rdmamojo.com/2012/05/24/ibv_fork_init/

Thanks for the reference. I think the relevant note (man ibv_fork_init) is:

It is not necessary to use this function if all parent process threads are always blocked until all child processes end or change address spaces via an exec() operation.

hzhou avatar May 04 '21 17:05 hzhou

@raffenet So for this issue, all we needed is a non-standard hint so that we skip intercomm connect/accept?

hzhou avatar Aug 31 '22 13:08 hzhou

Yes, I think adding it is a good idea. Open MPI uses "ompi_non_mpi" as the key. So maybe "mpich_non_mpi"?

raffenet avatar Aug 31 '22 14:08 raffenet