chapel
chapel copied to clipboard
Improve non-blocking AM performance in MCM mode message-order-fence
The coforall loc in Locales do on loc
idiom causes the initiating task to serially invoke non-blocking active messages on the other locales. If the message is too large to inject, the task will block until the transmission is complete to avoid re-using provider.
With this change non-blocking active messages that are too large to inject are copied by the communication layer so that the initiating task doesn't have to wait for the transmit to complete before sending the next active message. This reduces the time required to initiate the active messages and allows for more overlap in their execution.
This functionality is controlled by two environment variables:
CHPL_RT_COMM_OFI_NBAM_THRESHOLD
: non-blocking active messages are copied if their message size (in bytes) is less than or equal to this threshold . Note that if CHPL_RT_COMM_OFI_INJECT_AM
is set an active message is injected if it's below the injection threshold.
CHPL_RT_COMM_OFI_NBAM_NUM_BUFS
: controls how memory is allocated to hold the copies. If this value is > 0, then the communication layer will allocate the specified number of buffers of size CHPL_RT_COMM_OFI_NBAM_THRESHOLD
to hold the active message copies. If this variable is not set or is less than or equal to zero, then malloc and free are used to make copies of the messages.
Implementing these features necessitated changes to the completion counter implementation to only call fi_cntr_wait
when it is ok for the calling thread to block (i.e. fi_cntr_wait
does not return until the counter increments, which isn't always desirable).
This PR resolves Cray/chapel-private#3189 and closes Cray/chapel-private#3304 and closes Cray/chapel-private#3337
Signed-off-by: John H. Hartman [email protected]