Arm Patinyasakdikul
Arm Patinyasakdikul
This is somewhat expected. RCCL does not support generic kernels.
CI failure at linking?
Do we still need this? If not, can we close?
[Issue]: RCCL collective call Alltoall is performing way worse than normal MPI Alltoall on Frontier.
Hi, for alltoall, RCCL uses fan-out algorithm which is very crude (everyone send and recv from everyone). Whereas MPI is doing this in a more algorithmic way. This is the...
@jglaser You found me here. Yes, I incorporated this patch in the latest job as you suggested. I hope we get good result back.
Im closing this PR as this conversation has moved to a meeting and resolved.,
Hi, we have encountered this issue before and this is due to ROCm 7.0 behavior change to match CUDA. We no longer allow certain operations when graph is capturing. In...
Hi, yes. This should match CUDA behavior. ``` hipFree: Returned hipErrorStreamCaptureUnsupported : ``` From what we see here, RCCL is initializing and try to call `hipFree()`. I think `hipFree()` is...
LGTM please get Nilesh's signoff on his question regarding README.