peft
peft copied to clipboard
RuntimeError: CUDA error: invalid device ordinal
System Info
File "/home/xxx/anaconda3/envs/xxx/lib/python3.11/site-packages/accelerate/state.py", line 211, in __init__
torch.cuda.set_device(self.device)
File "/home/xxx/anaconda3/envs/xxx/lib/python3.11/site-packages/torch/cuda/__init__.py", line 350, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Who can help?
No response
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [X] An officially supported task in the
examplesfolder - [ ] My own task or dataset (give details below)
Reproduction
run scripts: examples/sft/run_peft_deepspeed.sh
Expected behavior
run success
@pacman100 do you have an idea?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.