nebuly
nebuly copied to clipboard
[Chatllama]: MultiGPU support for training
I'm trying to train the actor model (BLOOM 1.5B) on a multi-GPU setup (3-V100s). When I observe the GPU usage, only the GPU:0 is utilized and I run out of memory if I increase the batch_size.
Could you add multi-GPU support using HuggingFace's accelerate to facilitate the training of larger models with a larger batch size?
Thank you
Hi @TejaGollapudi, thank you very much for reaching out. We are currently working on supporting the Accelerate library. You can check directly the updates on the PR #233.
I added accelerate in the code as #233 ,bug got error:
Traceback (most recent call last):
File "/nvmessd0/nebullvm/apps/accelerate/chatllama/artifacts/main.py", line 3, in
@leonselina We will be releasing support for Accelerate very soon! We are currently testing the code and will keep you updated when we merge the code!
when would this MultiGPU support be available? Really looking forward to it.
Also looking forward to it!
Hi Everyone @bin123apple @balcklive @TejaGollapudi . you can try the PR #306 where deepspeed and accelerate should be working fine. keep in mind to launch the training with "deepspeed arifacts/main.py .." or "accelerate launch" instead of using "python" If you have any other problem on the matter let me know!
Hi @PierpaoloSorbellini , I trained Llama 7B with deepspeed, but got error: "MP=1 but world size is 2". How can I train Llama 7B with multi-GPU? because the limits of VRAM , maybe I should use model_parallel instead data_parallel for multi-GPU training. thanx:)
@PierpaoloSorbellini hey I try to try llama with hf format and I use deepseep with --num_gpus =2. The model was loaded twice and they were all loaded to the rank0 gpu which caused cuda oom.
do you have ideas to fix this problem?