starcoder icon indicating copy to clipboard operation
starcoder copied to clipboard

finetune time

Open Maomaoxion opened this issue 1 year ago • 7 comments

use A800 80g, how long it takes to finetune? I am stucking...

Maomaoxion avatar Jun 07 '23 04:06 Maomaoxion

I have the same problem, finetuning one step takes me about one hour (8 A800 80G GPUs)

Casi11as avatar Jun 10 '23 08:06 Casi11as

How to use multiple GPUs?I have try CUDA_VISIBLE_DEVICES, no effect

Maomaoxion avatar Jun 10 '23 09:06 Maomaoxion

How to use multiple GPUs?I have try CUDA_VISIBLE_DEVICES, no effect

I used deepspeed.

deepspeed --num_gpus=8 finetune/finetune.py --other args

Casi11as avatar Jun 10 '23 09:06 Casi11as

I have the same problem, finetuning one step takes me about one hour (8 A800 80G GPUs)

I think the problem is that 'accelerate' although distributes weights to different GPUs, does NOT distribute computational load.

If you look at nvidia-smi output while training, you will see only ONE GPU being activated at one time.

So basically only 1/8 of your pretty beefy setup is being used at one time.

phalexo avatar Jun 10 '23 15:06 phalexo

I have the same problem, finetuning one step takes me about one hour (8 A800 80G GPUs)

I think the problem is that 'accelerate' although distributes weights to different GPUs, does NOT distribute computational load.

If you look at nvidia-smi output while training, you will see only ONE GPU being activated at one time.

So basically only 1/8 of your pretty beefy setup is being used at one time.

If that's the case, how do I set up the configuration?

Casi11as avatar Jun 12 '23 01:06 Casi11as

Actually I have to accelerate my model to multiple GPUs, and otherwise it leads to a cuda OOM error, when fine-tuning on 4 rtx 6000 48 GB GPU cards. However, accelerate can cause significant memory waste as being mentioned above and still I encounter an oom when increasing seq_length to 8192. I am kind of interested in fine-tuning my model with both deepspeed and accelerate, but does anyone know how to implement my code with this setup?

WrViajero avatar Jun 21 '23 07:06 WrViajero

How to use multiple GPUs?I have try CUDA_VISIBLE_DEVICES, no effect do you success?now

CEfanmin avatar Oct 13 '23 06:10 CEfanmin