zejunchen-zejun
zejunchen-zejun
It works! Thank you!
Hi, @wenkaidu @gilbertlee-amd Could you help take a look? Thank you
Hi, @thananon Thank you for help! `we have encountered this issue before and this is due to ROCm 7.0 behavior change to match CUDA. We no longer allow certain operations...
Hi, @amd-nicknick Thank you for help. We will run the reproducer on B200 machine and check the behavior.
Hi, @amd-nicknick @thananon I tested the reproducer on B200 and the behavior is `torch.dist.all_reduce cannot be captured by cuda graph on NV platform`. So I think the ROCm's behavior does...
Hi, @amd-nicknick @thananon Thank you for help. You are right. It makes perfect sense. The first torch.dist op will trigger the lazy initialization of RCCL, which calls the hipFree internally....