ART
ART copied to clipboard
Docs on decision-making process for choosing batch size, learning rate, etc?
At a first approximation, it's not obvious how to think about choosing a batch size and learning rate. Small batches reduce inference overhead on the GPUs and generally reduce iteration time, but can also lead to unstable training runs.
Reducing learning rate and using small batches seems like a reasonable approach to achieve good performance with low feedback latency, but still not sure how to calculate either number. Docs would be super helpful!