peft icon indicating copy to clipboard operation
peft copied to clipboard

RuntimeError: CUDA error: invalid device ordinal

Open yumath opened this issue 1 year ago • 2 comments

System Info

  File "/home/xxx/anaconda3/envs/xxx/lib/python3.11/site-packages/accelerate/state.py", line 211, in __init__
    torch.cuda.set_device(self.device)
  File "/home/xxx/anaconda3/envs/xxx/lib/python3.11/site-packages/torch/cuda/__init__.py", line 350, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Who can help?

No response

Information

  • [X] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [X] An officially supported task in the examples folder
  • [ ] My own task or dataset (give details below)

Reproduction

run scripts: examples/sft/run_peft_deepspeed.sh

Expected behavior

run success

yumath avatar Apr 16 '24 03:04 yumath

@pacman100 do you have an idea?

BenjaminBossan avatar Apr 16 '24 12:04 BenjaminBossan

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions[bot] avatar May 16 '24 15:05 github-actions[bot]