Launch non-MPI processes with MPI_Comm_spawn
MPI_Comm_spawn() currently doesn't have a way to launch a non-MPI process, one that will never call MPI_Init(). This would be essential when one needs to use MPI to manage programs that call system() or fork(), since those functions may cause undefined behavior in MPI.
Suggest adding a flag to the MPI_Info argument to MPI_Comm_spawn().
@kmccall882 , Have you tried system() or fork()? I think it will probably work.
This issue was discussed some months ago, and I don't remember the context. I've been successful at using MPI_Comm_spawn() to create processes. I'm pretty sure that I read that system() and fork() can cause undefined behavior in MPI.
Kurt
I've been successful at using MPI_Comm_spawn() to create processes.
MPI_Comm_spawn need return an inter communicator. How would that be possible if the spawned processes include non-MPI processes?
I'm pretty sure that I read that system() and fork() can cause undefined behavior in MPI.
Since processes launched via system() or fork() are not properly registered with the process manager, it may mess up the process manager's handling of IO capturing or signal handling. But I suspect for "normal" operations, that the launching (MPI) process handles the reaping of fork-ed children cleanly, it may just work. I am inclined to see what issues may arise with direct system and fork and see if we can amend it with process manager.
- MPI_Comm_spawn needs to return an inter communicator. How would that be possible if the spawned processes include non-MPI processes? Sorry Hui, I didn't know the context of the question. It wasn't included in the message I received. Thanks for the info about system() and fork().
Kurt
From: Hui Zhou @.> Sent: Tuesday, May 4, 2021 10:33 AM To: pmodels/mpich @.> Cc: Mccall, Kurt E. (MSFC-EV41) @.>; Mention @.> Subject: [EXTERNAL] Re: [pmodels/mpich] Launch non-MPI processes with MPI_Comm_spawn (#4252)
I've been successful at using MPI_Comm_spawn() to create processes.
MPI_Comm_spawn need return an inter communicator. How would that be possible if the spawned processes include non-MPI processes?
I'm pretty sure that I read that system() and fork() can cause undefined behavior in MPI.
Since processes launched via system() or fork() are not properly registered with the process manager, it may mess up the process manager's handling of IO capturing or signal handling. But I suspect for "normal" operations, that the launching (MPI) process handles the reaping of fork-ed children cleanly, it may just work. I am inclined to see what issues may arise with direct system and fork and see if we can amend it with process manager.
You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpmodels%2Fmpich%2Fissues%2F4252%23issuecomment-832034721&data=04%7C01%7Ckurt.e.mccall%40nasa.gov%7C425852d6c3f949ad076b08d90f11f2b7%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C637557392027938737%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=NahoquHTZ1V3squf%2BluE%2F9CwURpIix4p0wwR57IqVKk%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACLN75P462IBSVZ6BPNCNULTMAHTZANCNFSM4KD3M34A&data=04%7C01%7Ckurt.e.mccall%40nasa.gov%7C425852d6c3f949ad076b08d90f11f2b7%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C637557392027948691%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=tDEnIJDfoiEM2uHdnknmJL%2Bd5x0SFUR%2Bnjl6tWpK76A%3D&reserved=0.
MPI_Comm_spawnneed return an inter communicator. How would that be possible if the spawned processes include non-MPI processes?
I believe OpenMPI added a non-standard hint to support non-MPI processes. That is, do not wait for intercomm creation.
Since processes launched via
system()orfork()are not properly registered with the process manager, it may mess up the process manager's handling of IO capturing or signal handling. But I suspect for "normal" operations, that the launching (MPI) process handles the reaping offork-ed children cleanly, it may just work. I am inclined to see what issues may arise with directsystemandforkand see if we can amend it with process manager.
Some libraries, like libibverbs, are not fork safe in all instances.
Some libraries, like libibverbs, are not fork safe in all instances.
Some links about libibverbs and fork support: https://www.rdmamojo.com/2012/05/18/libibverbs/#Fork_safe https://www.rdmamojo.com/2012/05/24/ibv_fork_init/
Some libraries, like libibverbs, are not fork safe in all instances.
Some links about libibverbs and fork support: https://www.rdmamojo.com/2012/05/18/libibverbs/#Fork_safe https://www.rdmamojo.com/2012/05/24/ibv_fork_init/
Thanks for the reference. I think the relevant note (man ibv_fork_init) is:
It is not necessary to use this function if all parent process threads are always blocked until all child processes end or change address spaces via an exec() operation.
@raffenet So for this issue, all we needed is a non-standard hint so that we skip intercomm connect/accept?
Yes, I think adding it is a good idea. Open MPI uses "ompi_non_mpi" as the key. So maybe "mpich_non_mpi"?