tuning_playbook
tuning_playbook copied to clipboard
A playbook for systematically maximizing the performance of deep learning models.
[One](https://github.com/google-research/tuning_playbook#determining-the-feasible-batch-sizes-and-estimating-training-throughput) of the sections mentions that `training throughput` is equivalent to `time per step`. There is a doubt here. Suppose there are two kinds of `batch size`: `64` and `128`,...
The goal of a machine learning algorithm is to **minimize** the expected validation/generalization error, not **maximize** it [[1]](#1). [1] Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press,...
> - All benefits of using a larger batch size assume the training throughput increases. If it doesn't, fix the bottleneck or use the smaller batch size. > - Gradient...
In the section on **Potential fixes for common instability patterns** there is a line stating: ``` Norm(x + f(x)) known to cause issues. ``` [Text is here](https://github.com/google-research/tuning_playbook/blob/890cd5387477a5ea1b3a7a40fe4ad1d077f54151/README.md?plain=1#L1860). Are there any...
Fix the missing parentheses form the Nesterov equation
Hey, thanks for writing such a great document, we've translated it into simplified Chinese, hope it helps!
Hi everyone, I would like to know your opinion about seqtopoint deep learning models for forecasting and missing data. I am dealing with a time series that has a lot...
Hi, thanks for sharing this wonderful document. I have two questions: First, how would we know if the model already gets the best performance after trying different methods of tuning?...
I believe the gradient checkpointing can be very useful if you have to maintain some minimum batch size, and you can't do that with your hardware. I was training a...