framework
framework copied to clipboard
Improve MPI implementation of serialisation to handle unreliable machines
The current MPI implementation of serialize message does the following this:
- send the message in one MPI call if its size is small (by defaut 5000 ko)
- send the message in two MPI calls if it's not the case. The first message contains the total size and the second message is the full message.
Some MPI implementation may have (temporary?) problems when there are too many or too big messages.
To solve this problem, we can try several fix:
- send the message with multiple packets whose size is fixed
- do not send the full message if the corresponding receive message is not posted.