Traceback (most recent call last):
File "/ceph/home/tong01/wyf/COT-Coder-master/unsloth_grpo.py", line 25, in
model, tokenizer = FastLanguageModel.from_pretrained(
File "/ceph/home/tong01/wyf/unsloth/unsloth/models/loader.py", line 292, in from_pretrained
model, tokenizer = dispatch_model.from_pretrained(
File "/ceph/home/tong01/wyf/unsloth/unsloth/models/qwen2.py", line 87, in from_pretrained
return FastLlamaModel.from_pretrained(
File "/ceph/home/tong01/wyf/unsloth/unsloth/models/llama.py", line 1798, in from_pretrained
llm = load_vllm(**load_vllm_kwargs)
File "/ceph/home/tong01/miniconda3/envs/unsloth/lib/python3.11/site-packages/unsloth_zoo/vllm_utils.py", line 1003, in load_vllm
raise RuntimeError(error)
RuntimeError: /ceph/home/tong01/miniconda3/envs/unsloth/lib/python3.11/site-packages/torchvision.libs/libcudart.7ec1eba6.so.12 (deleted): cannot open shared object file: No such file or directory
That most likely means your computer doesn't have CUDA - try installing cudatoolkit
@danielhanchen I have CUDA
nvcc --version:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
torch.version
'2.5.1+cu121'
torch.cuda.is_available()
True
Besides, when I removed the torchvision using pip uninstall torchvision, and rerun the code, I got the following errors:
Traceback (most recent call last):
File "/ceph/home/tong01/wyf/COT-Coder-master/unsloth_grpo.py", line 25, in
model, tokenizer = FastLanguageModel.from_pretrained(
File "/ceph/home/tong01/wyf/unsloth/unsloth/models/loader.py", line 292, in from_pretrained
model, tokenizer = dispatch_model.from_pretrained(
File "/ceph/home/tong01/wyf/unsloth/unsloth/models/qwen2.py", line 87, in from_pretrained
return FastLlamaModel.from_pretrained(
File "/ceph/home/tong01/wyf/unsloth/unsloth/models/llama.py", line 1798, in from_pretrained
llm = load_vllm(**load_vllm_kwargs)
File "/ceph/home/tong01/miniconda3/envs/unsloth/lib/python3.11/site-packages/unsloth_zoo/vllm_utils.py", line 1003, in load_vllm
raise RuntimeError(error)
RuntimeError: /ceph/home/tong01/miniconda3/envs/unsloth/lib/python3.11/site-packages/nvidia/cuda_runtime/lib/libcudart.so.12 (deleted): cannot open shared object file: No such file or directory
Oh my it seems like maybe all of torch might be broken :(
Ie Conda is not recognising the correct CUDA path