starcoder
starcoder copied to clipboard
finetune time
use A800 80g, how long it takes to finetune? I am stucking...
I have the same problem, finetuning one step takes me about one hour (8 A800 80G GPUs)
How to use multiple GPUs?I have try CUDA_VISIBLE_DEVICES, no effect
How to use multiple GPUs?I have try CUDA_VISIBLE_DEVICES, no effect
I used deepspeed.
deepspeed --num_gpus=8 finetune/finetune.py --other args
I have the same problem, finetuning one step takes me about one hour (8 A800 80G GPUs)
I think the problem is that 'accelerate' although distributes weights to different GPUs, does NOT distribute computational load.
If you look at nvidia-smi output while training, you will see only ONE GPU being activated at one time.
So basically only 1/8 of your pretty beefy setup is being used at one time.
I have the same problem, finetuning one step takes me about one hour (8 A800 80G GPUs)
I think the problem is that 'accelerate' although distributes weights to different GPUs, does NOT distribute computational load.
If you look at nvidia-smi output while training, you will see only ONE GPU being activated at one time.
So basically only 1/8 of your pretty beefy setup is being used at one time.
If that's the case, how do I set up the configuration?
Actually I have to accelerate my model to multiple GPUs, and otherwise it leads to a cuda OOM error, when fine-tuning on 4 rtx 6000 48 GB GPU cards. However, accelerate can cause significant memory waste as being mentioned above and still I encounter an oom when increasing seq_length to 8192. I am kind of interested in fine-tuning my model with both deepspeed and accelerate, but does anyone know how to implement my code with this setup?
How to use multiple GPUs?I have try CUDA_VISIBLE_DEVICES, no effect do you success?now