Ask-Anything
Ask-Anything copied to clipboard
Dear author, How much time does it cost to train this model? With what type of GPU cards?
Thanks for your question. In our experiments, we use 32 A100 for faster training. However, if you want to fine-tune the model on your own dataset, 1 A100 or V100 is enough.
Thanks for your question. In our experiments, we use 32 A100 for faster training. However, if you want to fine-tune the model on your own dataset, 1 A100 or V100 is enough.
Hi, how many hours do 32*A100 for total training?
Knowing how many hours is very helpful (and is often left out of papers). This information allows folks that don't have access to dozens of GPUs know if they have a chance of training models in a reasonable time on one or two GPUs.
Actually, for instruction tuning, the data size is often small. For example, for small data with thousands of videos, we only use 1 GPU (>40G) to train for a few hours.
However, in our paper, we hope to verify the data scale and diversity, so we collect millions of videos, which requires many more sources to train it. That's why we release different stages of models. We hope researchers can fine-tune models based on our pretraining.
Besides, in the current codebase, we do not conduct in-depth optimization for low-cost training. Researchers can follow other repos for efficient training strategies like QLoRA and low-bit training, like LAVIN, Otter or others.