LLaMA-Factory icon indicating copy to clipboard operation
LLaMA-Factory copied to clipboard

WSL下无法使用多卡运行

Open gotothehill opened this issue 1 year ago • 0 comments

Reminder

  • [X] I have read the README and searched the existing issues.

System Info

  • llamafactory version: 0.9.1.dev0
  • Platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
  • Python version: 3.11.9
  • PyTorch version: 2.4.1+cu121
  • Transformers version: 4.45.0
  • Datasets version: 2.21.0
  • Accelerate version: 0.34.2
  • PEFT version: 0.12.0
  • TRL version: 0.9.6

NVIDIA驱动: nvidia-smi Fri Sep 27 17:46:20 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.183.06 Driver Version: 560.81 CUDA Version: 12.6 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 4090 On | 00000000:1B:00.0 Off | Off | | 30% 47C P2 111W / 450W | 20732MiB / 24564MiB | 32% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce RTX 4090 On | 00000000:1E:00.0 Off | Off | | 30% 31C P8 19W / 450W | 0MiB / 24564MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 2 NVIDIA GeForce RTX 4090 On | 00000000:89:00.0 Off | Off | | 30% 31C P8 22W / 450W | 0MiB / 24564MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 3 NVIDIA GeForce RTX 4090 On | 00000000:8C:00.0 Off | Off | | 30% 31C P8 18W / 450W | 51MiB / 24564MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+

CUDA: nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Mon_Apr__3_17:16:06_PDT_2023 Cuda compilation tools, release 12.1, V12.1.105 Build cuda_12.1.r12.1/compiler.32688072_0

Reproduction

默认校验有问题: /mnt/d/AI-WSL/LLaMA-Factory$ llamafactory-cli version /home/ggec/miniconda3/envs/factory/lib/python3.10/site-packages/torch/cuda/init.py:128: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 2: out of memory (Triggered internally at /opt/conda/conda-bld/pytorch_1724789115765/work/c10/cuda/CUDAFunctions.cpp:108.) return torch._C._cuda_getDeviceCount() > 0 [2024-09-27 17:13:35,737] [WARNING] [real_accelerator.py:162:get_accelerator] Setting accelerator to CPU. If you have GPU or other accelerator, we were unable to detect it. [2024-09-27 17:13:35,748] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cpu (auto detect)

| Welcome to LLaMA Factory, version 0.9.1.dev0 | | | | Project page: https://github.com/hiyouga/LLaMA-Factory |

指定GPU校验正常: /mnt/d/AI-WSL/LLaMA-Factory$ CUDA_VISIBLE_DEVICES=0 llamafactory-cli version [2024-09-27 17:14:05,213] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4 [WARNING] using untested triton version (3.0.0), only 1.0.0 is known to be compatible /home/ggec/miniconda3/envs/factory/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:49: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead. def forward(ctx, input, weight, bias=None): /home/ggec/miniconda3/envs/factory/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:67: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead. def backward(ctx, grad_output):

| Welcome to LLaMA Factory, version 0.9.1.dev0 | | | | Project page: https://github.com/hiyouga/LLaMA-Factory |

指定GPU device可以训练: /mnt/d/AI-WSL/LLaMA-Factory$ CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples /train_lora/llama3_lora_sft.yaml

Expected behavior

目前只能指定一个gpu device进行训练,如何能够使用多卡进行训练?

Others

No response

gotothehill avatar Sep 27 '24 09:09 gotothehill