OLMo
OLMo copied to clipboard
Does global_train_batch_size support gradient accumulation?
❓ The question
Hello authors, thank you very much for your inspiring work. I now have 8 A100s. If I want to pretrain the model at a certain checkpoint, can I set global_train_batch_size to the original 2048 and then set device_train_microbatch_size to 2? Is this equivalent to using more GPUs?
@jinzhuoran Yes, this should be possible. Have you faced an issue when trying this?
Hi, thanks again for the inquiry! We’re currently working on closing out old tickets, so we’re closing this out for now, but if you require a follow-up response, please re-open and we will get back to you!