sst-core
sst-core copied to clipboard
sending 2GB messages in SST-core
When many messages that are to be run at later times on another MPI processor are queued there seems to be a bug when there is >2GB of messages to send. The MPI Isend
routine can only take a 2GB as the maximum size (per the MPI spec) and there is no loop that breaks the message into 2GB chunks. The jobs fail in random ways -- sometimes by quietly exiting and other times in an MPI deadlock.
Recommendation: throw an exception when the data to be sent is greater than 2GB in syncQueue.cc
. This at least gives an error message when this will occur and doesn’t just hang or exit with no reason.
There is no check in MPI_Waitall() on the messages to see if there is an error. I’m not sure it will catch this error but it could be useful to not ignore the error status.