ROC_SHMEM icon indicating copy to clipboard operation
ROC_SHMEM copied to clipboard

[Feature]: Renaming signal op

Open yiakwy-xpu-ml-framework-team opened this issue 2 months ago • 3 comments

Suggestion Description

Description

Since overal strategy of ROCM library in past, developers need to have a closer look at NVIDIA nvshem implementation (shareing the software development fundations).

Currently signal op such as nvshmemx_signal_op is missed from library.

We have to replace warp-level signal op with thread-level signal op.

For example to implement rocshmemx_signal_op which does not exist in rocshmem we have written codes such as

...
rocshmem_int_p(dest, value, target_pe);
...

Solution

Implement rocshmem_int_p based signal op to immedaitely catch up the nvshmem signal op interface.

Full signature :

device inline void rocshmem_signal_op(uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)

Operating System

No response

GPU

No response

ROCm Component

No response

@drprajap Could you have a look at this ?

Hello,

Your proposed rocshmem_int_p workaround is incorrect and will produce randomly wrong results (from atomicity race condition).

A more correct workaround would use rocshmem_putmem_signal with 0-byte nelems. Note that _wg and _wave variants (equivalent to _block and _warp) are also available.

abouteiller avatar Oct 20 '25 20:10 abouteiller

@abouteiller thanks ! You are right ! I will update with function name!