sst-core icon indicating copy to clipboard operation
sst-core copied to clipboard

sending 2GB messages in SST-core

Open researcherben opened this issue 1 year ago • 0 comments

When many messages that are to be run at later times on another MPI processor are queued there seems to be a bug when there is >2GB of messages to send. The MPI Isend routine can only take a 2GB as the maximum size (per the MPI spec) and there is no loop that breaks the message into 2GB chunks. The jobs fail in random ways -- sometimes by quietly exiting and other times in an MPI deadlock.

Recommendation: throw an exception when the data to be sent is greater than 2GB in syncQueue.cc. This at least gives an error message when this will occur and doesn’t just hang or exit with no reason.

There is no check in MPI_Waitall() on the messages to see if there is an error. I’m not sure it will catch this error but it could be useful to not ignore the error status.

researcherben avatar May 31 '23 01:05 researcherben