WSL下无法使用多卡运行
Reminder
- [X] I have read the README and searched the existing issues.
System Info
llamafactoryversion: 0.9.1.dev0- Platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
- Python version: 3.11.9
- PyTorch version: 2.4.1+cu121
- Transformers version: 4.45.0
- Datasets version: 2.21.0
- Accelerate version: 0.34.2
- PEFT version: 0.12.0
- TRL version: 0.9.6
NVIDIA驱动: nvidia-smi Fri Sep 27 17:46:20 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.183.06 Driver Version: 560.81 CUDA Version: 12.6 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 4090 On | 00000000:1B:00.0 Off | Off | | 30% 47C P2 111W / 450W | 20732MiB / 24564MiB | 32% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce RTX 4090 On | 00000000:1E:00.0 Off | Off | | 30% 31C P8 19W / 450W | 0MiB / 24564MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 2 NVIDIA GeForce RTX 4090 On | 00000000:89:00.0 Off | Off | | 30% 31C P8 22W / 450W | 0MiB / 24564MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 3 NVIDIA GeForce RTX 4090 On | 00000000:8C:00.0 Off | Off | | 30% 31C P8 18W / 450W | 51MiB / 24564MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+
CUDA: nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Mon_Apr__3_17:16:06_PDT_2023 Cuda compilation tools, release 12.1, V12.1.105 Build cuda_12.1.r12.1/compiler.32688072_0
Reproduction
默认校验有问题: /mnt/d/AI-WSL/LLaMA-Factory$ llamafactory-cli version /home/ggec/miniconda3/envs/factory/lib/python3.10/site-packages/torch/cuda/init.py:128: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 2: out of memory (Triggered internally at /opt/conda/conda-bld/pytorch_1724789115765/work/c10/cuda/CUDAFunctions.cpp:108.) return torch._C._cuda_getDeviceCount() > 0 [2024-09-27 17:13:35,737] [WARNING] [real_accelerator.py:162:get_accelerator] Setting accelerator to CPU. If you have GPU or other accelerator, we were unable to detect it. [2024-09-27 17:13:35,748] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cpu (auto detect)
| Welcome to LLaMA Factory, version 0.9.1.dev0 | | | | Project page: https://github.com/hiyouga/LLaMA-Factory |
指定GPU校验正常:
/mnt/d/AI-WSL/LLaMA-Factory$ CUDA_VISIBLE_DEVICES=0 llamafactory-cli version
[2024-09-27 17:14:05,213] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4
[WARNING] using untested triton version (3.0.0), only 1.0.0 is known to be compatible
/home/ggec/miniconda3/envs/factory/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:49: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead.
def forward(ctx, input, weight, bias=None):
/home/ggec/miniconda3/envs/factory/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:67: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead.
def backward(ctx, grad_output):
| Welcome to LLaMA Factory, version 0.9.1.dev0 | | | | Project page: https://github.com/hiyouga/LLaMA-Factory |
指定GPU device可以训练: /mnt/d/AI-WSL/LLaMA-Factory$ CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples /train_lora/llama3_lora_sft.yaml
Expected behavior
目前只能指定一个gpu device进行训练,如何能够使用多卡进行训练?
Others
No response