framework icon indicating copy to clipboard operation
framework copied to clipboard

Improve MPI implementation of serialisation to handle unreliable machines

Open grospelliergilles opened this issue 4 years ago • 0 comments

The current MPI implementation of serialize message does the following this:

  • send the message in one MPI call if its size is small (by defaut 5000 ko)
  • send the message in two MPI calls if it's not the case. The first message contains the total size and the second message is the full message.

Some MPI implementation may have (temporary?) problems when there are too many or too big messages.

To solve this problem, we can try several fix:

  • send the message with multiple packets whose size is fixed
  • do not send the full message if the corresponding receive message is not posted.

grospelliergilles avatar Dec 14 '21 13:12 grospelliergilles