keras-io icon indicating copy to clipboard operation
keras-io copied to clipboard

The distributed training example fails to mention batch and LR scaling

Open martin-gorner opened this issue 2 years ago • 4 comments
trafficstars

Keras.io example: https://keras.io/examples/nlp/data_parallel_training_with_keras_nlp/ Merged PR: https://github.com/keras-team/keras-io/pull/1395

This example is good on the whole but it would be much better with proper batch size and learning rate scaling. Without this, using two accelerators instead of one will not train any faster.

The usual scaling is:

batch_size = strategy.num_replicas_in_sync * sigle_worker_batch_size The large global batch is processes on the multiple accelerators in chunks, one chunk per accelerator. Without increasing the batch size, you are sending smaller per-worker batches to the accelerators, potentially under-utilizing them.

lr = strategy.num_replicas_in_sync * single_worker_lr Bigger batches also means fewer gradient updates per epoch. Without scaling the LR, the model will be learning more slowly on multiple workers. Gradient updates computed on bigger batches need to be allowed to do "more training work", through a higher learning rate.

Of course, these are just rules of thumb. Actual optimal values can only be obtained by careful hyper-parameter tuning, with both raw speed, and time to convergence metrics.

martin-gorner avatar Aug 30 '23 12:08 martin-gorner

@shivance I believe you authored this guide.

martin-gorner avatar Aug 30 '23 12:08 martin-gorner

@martin-gorner @sachinprasadhs Can I work on this Issue?

Zekrom-7780 avatar Oct 29 '23 18:10 Zekrom-7780

@Zekrom-7780 , Thanks for Volunteering, Feel free to create a PR.

sachinprasadhs avatar Oct 30 '23 18:10 sachinprasadhs

This issue is stale because it has been open for 180 days with no activity. It will be closed if no further activity occurs. Thank you.

github-actions[bot] avatar Apr 28 '24 01:04 github-actions[bot]

This issue was closed because it has been inactive for more than 1 year.

github-actions[bot] avatar Apr 28 '25 02:04 github-actions[bot]