zejunchen-zejun

Results 16 comments of zejunchen-zejun

Hi, @thananon Thank you for help! `we have encountered this issue before and this is due to ROCm 7.0 behavior change to match CUDA. We no longer allow certain operations...

Hi, @amd-nicknick Thank you for help. We will run the reproducer on B200 machine and check the behavior.

Hi, @amd-nicknick @thananon I tested the reproducer on B200 and the behavior is `torch.dist.all_reduce cannot be captured by cuda graph on NV platform`. So I think the ROCm's behavior does...

Hi, @amd-nicknick @thananon Thank you for help. You are right. It makes perfect sense. The first torch.dist op will trigger the lazy initialization of RCCL, which calls the hipFree internally....