Ask-Anything icon indicating copy to clipboard operation
Ask-Anything copied to clipboard

Dear author, How much time does it cost to train this model? With what type of GPU cards?

Open zhangyuereal opened this issue 1 year ago • 4 comments

zhangyuereal avatar Dec 27 '23 10:12 zhangyuereal

Thanks for your question. In our experiments, we use 32 A100 for faster training. However, if you want to fine-tune the model on your own dataset, 1 A100 or V100 is enough.

Andy1621 avatar Jan 02 '24 07:01 Andy1621

Thanks for your question. In our experiments, we use 32 A100 for faster training. However, if you want to fine-tune the model on your own dataset, 1 A100 or V100 is enough.

Hi, how many hours do 32*A100 for total training?

unira-zwj avatar Feb 29 '24 13:02 unira-zwj

Knowing how many hours is very helpful (and is often left out of papers). This information allows folks that don't have access to dozens of GPUs know if they have a chance of training models in a reasonable time on one or two GPUs.

adeobootpin avatar Mar 08 '24 02:03 adeobootpin

Actually, for instruction tuning, the data size is often small. For example, for small data with thousands of videos, we only use 1 GPU (>40G) to train for a few hours.

However, in our paper, we hope to verify the data scale and diversity, so we collect millions of videos, which requires many more sources to train it. That's why we release different stages of models. We hope researchers can fine-tune models based on our pretraining.

Besides, in the current codebase, we do not conduct in-depth optimization for low-cost training. Researchers can follow other repos for efficient training strategies like QLoRA and low-bit training, like LAVIN, Otter or others.

Andy1621 avatar Mar 10 '24 08:03 Andy1621