Does global_train_batch_size support gradient accumulation?

Open jinzhuoran opened this issue 1 year ago • 1 comments

❓ The question

Hello authors, thank you very much for your inspiring work. I now have 8 A100s. If I want to pretrain the model at a certain checkpoint, can I set global_train_batch_size to the original 2048 and then set device_train_microbatch_size to 2? Is this equivalent to using more GPUs?

Jul 21 '24 10:07 jinzhuoran

@jinzhuoran Yes, this should be possible. Have you faced an issue when trying this?

Jul 29 '24 16:07 AkshitaB

Hi, thanks again for the inquiry! We’re currently working on closing out old tickets, so we’re closing this out for now, but if you require a follow-up response, please re-open and we will get back to you!

Jul 01 '25 17:07 baileykuehl