perftest icon indicating copy to clipboard operation
perftest copied to clipboard

Question about ib_write_lat with CUDA

Open yzygitzh opened this issue 2 years ago • 2 comments

I've found that ib_write_lat doesn't support CUDA mode. Wonder whether there is any intrinsic issue that prevents supporting this? I think it should not be CUDA issue because NCCL library is using IB write with GPU. If there isn't a big obstacle, I can help draft a PR to fix this.

yzygitzh avatar Nov 27 '23 10:11 yzygitzh

I've found that ib_write_lat doesn't support CUDA mode. Wonder whether there is any intrinsic issue that prevents supporting this? I think it should not be CUDA issue because NCCL library is using IB write with GPU. If there isn't a big obstacle, I can help draft a PR to fix this.

Can you share your PR link ? I remove the error exit, and try to run on A100, it will be crash and gdb showed that not host memory, so it could be CUDA memory issue

Thanks

elevenxiang avatar Dec 08 '23 10:12 elevenxiang

Hi, sorry for misleading. I meant I don’t know the key issue to support write latency for CUDA either.

yzygitzh avatar Dec 11 '23 04:12 yzygitzh