MPI Ports Communication Problems
We keep on having problems with both the MPIPorts and the MPISinglePorts communication. The point of this issue of to document and find the root cause of the problems.
Used MPI Calls
MPI_Open_port()
Opens a port and returns a string referring to it. OpenMPI MPICH
MPI_Close_port()
Closes the port and releases owned resources. OpenMPI MPICH
MPI_Comm_accept()
Given a port and a local comm, this creates an intercommunicator to the requesting side. In OpenMPI, this call is collective on the local comm. OpenMPI MPICH
MPI_Comm_connect()
Given a port and a local comm, this creates an intercommunicator to the accepting side.
In OpenMPI, this call is collective on the local comm and requires that MPI_Comm_accept() has been called on all ranks of the accepting communicator.
OpenMPI MPICH
Our Implementation
MPIPortsCommunication
Similar to the SocketCommunication, this directly connect peers 1-1.
It uses MPI_COMM_SELF on the accepting and connecting side of the calls, creating one Communicator with size 1 (the other peer) per connection. As we use MPI_COMM_SELF, this never leads to problems as there are essentially no collective calls. This is thus a very robust approach given that the actual underlying communicator are always ignored. Master-Master connections, Master-Slave connections and Client-Server connections use the same implementation.
MPISinglePortsCommunication
In contrast to MPIPortsCommunication this communication uses a single communicator for the Client-Server communication. It still needs to rely on 1-1 connections for the Master-Master communication.
The Master-Slave connections are currently implement as 1-1 connections, but could use a single communicator by splitting the base comm into MasterComm and SlavesComm. This is currently not implemented though.
Testing branch upstream/add-singleports-tests
The following problems occur in general, but I use a special branch for the tests which features a full port of the Com Tests the system introduces in #702. I also cleaned up the comm classes a bit.
Most importantly, I added a sleep prior to the calls to MPI_Comm_connect(). It's duration in milliseconds can be set using an env: export PRECICE_PORTS_WAIT=100.
The Problems
Most of these errors appear occasionally. Thus you probably need to run the command repeatedly to reproduce the errors.
All Tests work with MPICH 3.3.2 (Spack on Ubuntu 16.04).
Data unpack would read past end of buffer
This is the same error as #103 OpenMPI 1.10.2 (Ubuntu 16.04)
[atsccs73:11732] [[12168,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 406
[atsccs73:11732] *** An error occurred in MPI_Comm_accept
[atsccs73:11732] *** reported by process [797442049,0]
[atsccs73:11732] *** on communicator MPI_COMM_SELF
[atsccs73:11732] *** MPI_ERR_UNKNOWN: unknown error
[atsccs73:11732] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[atsccs73:11732] *** and potentially your MPI job)
To reproduce run:
rm -rf precice-run; mpirun -n 4 ./testprecice --run_test=CommunicationTests/MPIPorts
Bad file descriptor
OpenMPI 2.1.1 (Ubuntu 18.04, Spack on Ubuntu 16.04) Issue reference https://github.com/open-mpi/ompi/issues/5520 This occasionally occurs.
Reproduce run:
rm -rf precice-run; mpirun -n 4 ./testprecice --run_test=CommunicationTests/MPIPorts
Hanging in MPI_Comm_accept() and MPI_Comm_connect()
OpenMPI 4.0.3 (Spack on Ubuntu 16.04) Establishing the connection occasionally hangs without an error on both sides and eventually times out.
To reproduce run:
rm -rf precice-run; mpirun -n 4 ./testprecice --run_test=CommunicationTests/MPIPorts/SendReceiveFourProcessesMM
Underlying runtime environment does not support accept/connect functionality
OpenMPI 4.0.3 (Spack on Ubuntu 16.04)
At some point, the occasional hangs described above turn into this one.
This error appears consistently in successive runs after the first time.
Note that I never manually started an ompi-server in any scenario.
--------------------------------------------------------------------------
The user has called an operation involving MPI_Connect and/or MPI_Accept
that spans multiple invocations of mpirun. This requires the support of
the ompi-server tool, which must be executing somewhere that can be
accessed by all participants.
Please ensure the tool is running, and provide each mpirun with the MCA
parameter "pmix_server_uri" pointing to it.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Your application has invoked an MPI function that is not supported in
this environment.
MPI function: MPI_Comm_connect
Reason: Underlying runtime environment does not support accept/connect functionality
--------------------------------------------------------------------------
[atsccs73:26044] *** An error occurred in MPI_Comm_connect
[atsccs73:26044] *** reported by process [1740636161,1]
[atsccs73:26044] *** on communicator MPI_COMM_SELF
[atsccs73:26044] *** MPI_ERR_INTERN: internal error
[atsccs73:26044] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[atsccs73:26044] *** and potentially your MPI job)
To reproduce run:
rm -rf precice-run; mpirun -n 4 ./testprecice --run_test=CommunicationTests/MPIPorts/SendReceiveFourProcessesMM
more to come
Plan of action
Establishing the connection for MPIPorts and MPISinglePorts with OpenMPI is a general problem with OpenMPI.
What we know is:
- we don't understand why the problems occurs
- we know that the connection works reliably once established
Thus, we could just inform the user with a warning during CMake configuration and advise to restart simulations in case they hang.
We should consider the fact that we are doing everything correctly, but OpenMPI simply does not follow the standard in this case. In the end, MPI Ports how we use it is kind of a rare use case, I guess.
I have mixed feelings about the warning. I guess 90% of our users use OpenMPI and will run into the warning. However, almost all of them will never use any MPI Ports functionality. All tutorials use sockets and we recommend it everywhere. If they now run in any other issue with establishing communication (there are quite some) they will always come back to this warning, trying to install a different MPI implementation, screwing up their system -- even though they actually face a different problem.
A thing we should definitely do is to switch off all problematic tests when CMake detects OpenMPI.
A step further, a bit more aggressive: make MPI Ports an optional feature, switch it off by default and make it require a non-OpenMPI MPI.
Another question: does MPISinglePorts work with MPICH if we use it in all integration tests #624 ?
Then lets take the least invasive option and simply disable the tests when we detect OpenMPI in cmake. Developers can always run the tests directly via the binary.
-- Test precice.com
-- Test precice.com.mpiports - skipped (OpenMPI)
-- Test precice.cplscheme
-- Test precice.io
-- Test precice.m2n
-- Test precice.m2n.mpiports - skipped (OpenMPI)
-- Test precice.mapping
This issue has been mentioned on preCICE Forum on Discourse. There might be relevant details there:
https://precice.discourse.group/t/mpich-or-openmpi/383/2
This issue has been mentioned on preCICE Forum on Discourse. There might be relevant details there:
https://precice.discourse.group/t/highlights-of-the-new-precice-release-v2-2/429/1
We are now actively testing Intel MPI in the CI #1826.
For Intel MPI, I found that repeated calls of MPI_Comm_accept leads to truncation errors in the underlying openfabrics implementation. Adding a delay of 5ms worked on my local machine, but failed in the CI. Even a delay of 100ms doesn't fix this issue. I didn't commit the fix as it doesn't reliably fix the problem. This only applies to the communication that creates one Comm per end-to-end connection. MPISinglePorts works fine.
So we are ignoring mpiports tests now for OpenMPI and Intel MPI.
I stumbled accross MPI_Comm_join which connects communicators from various MPI runs using a file descriptor. This sounds a bit like our ConnectionInfoPublisher. Maybe this is worth investigating.