tuning_playbook icon indicating copy to clipboard operation
tuning_playbook copied to clipboard

All benefits of using a larger batch size assume the training throughput increases?

Open SimLif opened this issue 2 years ago • 0 comments

  • All benefits of using a larger batch size assume the training throughput increases. If it doesn't, fix the bottleneck or use the smaller batch size.
  • Gradient accumulation simulates a larger batch size than the hardware can support and therefore does not provide any throughput benefits. It should generally be avoided in applied work.

Is a more stable gradient descent guaranteed by adding batch size?
In which scenarios should the gradient accumulation method be used?

SimLif avatar Jan 28 '23 12:01 SimLif