alignment-handbook icon indicating copy to clipboard operation
alignment-handbook copied to clipboard

Global batch size question

Open liutianlin0121 opened this issue 1 year ago • 7 comments

Hi!

Thanks again for the awesome repo. I have a small question regarding the global batch size of DPO training reported in the paper vs used in the code base.

In the paper, it mentions that, for DPO, "We train all models with a global batch size of 32". This is consistent to the the hyperparam of HuggingFaceH4/zephyr-7b-beta.

In the codebase, we are suggested to use 8 GPUs to reproduce zephyr-7b-beta here.

You will require 8 GPUs (80GB of VRAM) to train the full model.

Since per_device_train_batch_size=8 in the recipes/zephyr-7b-beta/dpo/config_full.yaml, this means that the global batch size is 64, and not 32, when using 8 GPUs. While this is different from the paper, the global batchsize = 64 setting is consistent with the hyperparam in alignment-handbook/zephyr-7b-dpo-full.

My guess is that global batchsize = 32 or 64 would give similar performance, say, on MT-bench? Could you confirm it? Many thanks! I am about to launch some experiments, and I wish to get the details correct so as to reproduce the results from the paper as closely as possible 🙏.

liutianlin0121 avatar Nov 21 '23 15:11 liutianlin0121