DeepSpeed [BUG] Is it right that per_device_train_batch_size = per_device_mini_train_batch_size * gradient_accumulation

[BUG] Is it right that per_device_train_batch_size = per_device_mini_train_batch_size * gradient_accumulation_steps

Open feiliya333 opened this issue 1 year ago • 0 comments

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

Expected behavior A clear and concise description of what you expected to happen.

ds_report output Please run ds_report to give us details about your setup.

Screenshots If applicable, add screenshots to help explain your problem.

System info (please complete the following information):

Launcher context Are you launching your experiment with the deepspeed launcher, MPI, or something else?

Docker context Are you using a specific docker image that you can share?

Additional context Add any other context about the problem here.

Jun 12 '23 03:06 feiliya333