vall-e icon indicating copy to clipboard operation
vall-e copied to clipboard

Exception: >- DeepSpeed Op Builder: Installed CUDA version 12.1 does not match the version torch was compiled with 11.8

Open arvind-27 opened this issue 1 year ago • 1 comments

I was running valle.train for ar.yml and an error came up saying

_AttributeError: 'NoneType' object has no attribute 'optimizer_name'_

I installed deepseed version 0.8.3 as suggested by solutions from previous issues. However I am posed with another error saying

_Exception: >- DeepSpeed Op Builder: Installed CUDA version 12.1 does not match the version torch was compiled with 11.8, unable to compile cuda/cpp extensions without a matching cuda version._

I tried several things to resolve this including:-

DS_BUILD_OPS=1 pip install deepspeed==0.8.3 Downgrading pytorch versions to meet the cuda versions but none of them seems to be working. Downgrading pytorch version gives this error when I run the command for training autoregressive model

_OSError: /opt/conda/lib/python3.10/site-packages/torchaudio/lib/libtorchaudio.so: undefined symbol: _ZNK5torch8autograd4Node4nameB5cxx11Ev_

Can someone suggest me what to do to resolve the issue

arvind-27 avatar Jun 13 '23 05:06 arvind-27

for this error _Exception: >- DeepSpeed Op Builder: Installed CUDA version 12.1 does not match the version torch was compiled with 11.8, unable to compile cuda/cpp extensions without a matching cuda version._

perform pip uninstall torch torchaudio then pip install torch==1.9.0+cu121 -f https://download.pytorch.org/whl/torch_stable.html

for more info on this error check out this stack overflow post https://stackoverflow.com/questions/66116155/how-to-tell-pytorch-which-cuda-version-to-take

JonathanColetti avatar Jun 16 '23 15:06 JonathanColetti