Docs on decision-making process for choosing batch size, learning rate, etc?

Open arcticfly opened this issue 11 months ago • 1 comments

At a first approximation, it's not obvious how to think about choosing a batch size and learning rate. Small batches reduce inference overhead on the GPUs and generally reduce iteration time, but can also lead to unstable training runs.

Reducing learning rate and using small batches seems like a reasonable approach to achieve good performance with low feedback latency, but still not sure how to calculate either number. Docs would be super helpful!

Apr 10 '25 01:04 arcticfly