VADER Question about the training cost.

Question about the training cost.

Open yiboz2001 opened this issue 1 year ago • 4 comments

trafficstars

Hi, thanks for your excellent work! I would like to know more about the training cost of these three models. At the same time, I ran the ModelScope script on 4 A40-48G GPUs, and it shows that 460 hours are required for 10,000 iterations. Is this in line with expectations? I am looking forward to your reply. 1721547523274

Jul 21 '24 07:07 yiboz2001

Hi,

The network should train probably in about 6-12 hours. The number of iterations is not properly set currently, sorry about that.

Jul 22 '24 04:07 mihirp1998

Thanks for the response! Please notify me when the corrected version is released, thank you.

Jul 22 '24 06:07 yiboz2001

Hello. Thanks for pointing out the issue! We have updated the default max_train_steps value for reference. Considering you are using 4 GPUs, --gradient_accumulation_steps could be set to 4 or 2 to speed the training process up. Also, please feel free to select the optimal checkpoint based on the avg_loss curve or the visualization from wandb.

Jul 22 '24 19:07 QinOwen

same question, i need 60 hours to train Open-Sora on 2x A6000 GPUs. Could the author please specify which model was trained for 15 hours in the paper and what the corresponding configuration was?

Aug 21 '24 02:08 choucaicai

VADER VADER copied to clipboard

Question about the training cost.

VADER
VADER copied to clipboard