vall-e
vall-e copied to clipboard
Exception: >- DeepSpeed Op Builder: Installed CUDA version 12.1 does not match the version torch was compiled with 11.8
I was running valle.train for ar.yml and an error came up saying
_AttributeError: 'NoneType' object has no attribute 'optimizer_name'_
I installed deepseed version 0.8.3 as suggested by solutions from previous issues. However I am posed with another error saying
_Exception: >- DeepSpeed Op Builder: Installed CUDA version 12.1 does not match the version torch was compiled with 11.8, unable to compile cuda/cpp extensions without a matching cuda version._
I tried several things to resolve this including:-
DS_BUILD_OPS=1 pip install deepspeed==0.8.3 Downgrading pytorch versions to meet the cuda versions but none of them seems to be working. Downgrading pytorch version gives this error when I run the command for training autoregressive model
_OSError: /opt/conda/lib/python3.10/site-packages/torchaudio/lib/libtorchaudio.so: undefined symbol: _ZNK5torch8autograd4Node4nameB5cxx11Ev_
Can someone suggest me what to do to resolve the issue
for this error _Exception: >- DeepSpeed Op Builder: Installed CUDA version 12.1 does not match the version torch was compiled with 11.8, unable to compile cuda/cpp extensions without a matching cuda version._
perform pip uninstall torch torchaudio
then
pip install torch==1.9.0+cu121 -f https://download.pytorch.org/whl/torch_stable.html
for more info on this error check out this stack overflow post https://stackoverflow.com/questions/66116155/how-to-tell-pytorch-which-cuda-version-to-take