DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

DeepSpeed op builder

Open wangshuo6699 opened this issue 1 year ago • 4 comments

Thanks for helping to solve this problem 13261cfa8f343e420cb3c1e845dede1

wangshuo6699 avatar Apr 15 '23 10:04 wangshuo6699

I encountered this issue before. I fixed it by re-installing CUDA and PyTorch.

The below versions work for me: CUDA Toolkit 11.7 Downloads | NVIDIA Developer torch==1.13.1

RSPwFPGAs avatar Apr 16 '23 13:04 RSPwFPGAs

I also use torch 1.13.1 and cuda 11.7, but it doesn't work !

I encountered this issue before. I fixed it by re-installing CUDA and PyTorch.

The below versions work for me: CUDA Toolkit 11.7 Downloads | NVIDIA Developer torch==1.13.1

wangshuo6699 avatar Apr 17 '23 03:04 wangshuo6699

There may be some confusion here regarding which CUDA version we are referring to, so I'll clarify. There is a CUDA Runtime API and a CUDA Driver API. Each of these may have different versions depending on how CUDA was installed on your system.

nvidia-smi will report the Driver version while nvcc --version will report the Runtime version. Please check the CUDA version installed with nvcc --version and confirm that it is >=11.0 if your pytorch install is compiled with CUDA 11.

mrwyattii avatar Apr 17 '23 16:04 mrwyattii

Just to extend Mike's response above a bit. @wangshuo6699 it appears your nvcc --version is 10.x and the torch you are trying to use was compiled with 11.x. This will cause significant problems with deepspeed and even without. Nvidia recommends that you align these versions exactly, in practice we often see that as long as the major version 11.x matches you should be fine, but crossing major versions (i.e., 10.x vs 11.x) will unfortunately cause issues.

jeffra avatar Apr 17 '23 19:04 jeffra