chapel icon indicating copy to clipboard operation
chapel copied to clipboard

Improve non-blocking AM performance in MCM mode message-order-fence

Open jhh67 opened this issue 2 years ago • 3 comments

The coforall loc in Locales do on loc idiom causes the initiating task to serially invoke non-blocking active messages on the other locales. If the message is too large to inject, the task will block until the transmission is complete to avoid re-using provider.

With this change non-blocking active messages that are too large to inject are copied by the communication layer so that the initiating task doesn't have to wait for the transmit to complete before sending the next active message. This reduces the time required to initiate the active messages and allows for more overlap in their execution.

This functionality is controlled by two environment variables:

CHPL_RT_COMM_OFI_NBAM_THRESHOLD: non-blocking active messages are copied if their message size (in bytes) is less than or equal to this threshold . Note that if CHPL_RT_COMM_OFI_INJECT_AM is set an active message is injected if it's below the injection threshold.

CHPL_RT_COMM_OFI_NBAM_NUM_BUFS: controls how memory is allocated to hold the copies. If this value is > 0, then the communication layer will allocate the specified number of buffers of size CHPL_RT_COMM_OFI_NBAM_THRESHOLD to hold the active message copies. If this variable is not set or is less than or equal to zero, then malloc and free are used to make copies of the messages.

Implementing these features necessitated changes to the completion counter implementation to only call fi_cntr_wait when it is ok for the calling thread to block (i.e. fi_cntr_wait does not return until the counter increments, which isn't always desirable).

This PR resolves Cray/chapel-private#3189 and closes Cray/chapel-private#3304 and closes Cray/chapel-private#3337

Signed-off-by: John H. Hartman [email protected]

jhh67 avatar May 09 '22 23:05 jhh67