tuning_playbook issues

`training throughput` may not equal to `time per step`

1

[One](https://github.com/google-research/tuning_playbook#determining-the-feasible-batch-sizes-and-estimating-training-throughput) of the sections mentions that `training throughput` is equivalent to `time per step`. There is a doubt here. Suppose there are two kinds of `batch size`: `64` and `128`,...

SimLif

Fix typos of anchor links

1

XikunZhang

Fix typo in README.md

1

The goal of a machine learning algorithm is to **minimize** the expected validation/generalization error, not **maximize** it [[1]](#1). [1] Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press,...

oscmansan

All benefits of using a larger batch size assume the training throughput increases?

> - All benefits of using a larger batch size assume the training throughput increases. If it doesn't, fix the bottleneck or use the smaller batch size. > - Gradient...

SimLif

Evidence for Norm(x + f(x)) causing issues?

1

In the section on **Potential fixes for common instability patterns** there is a line stating: ``` Norm(x + f(x)) known to cause issues. ``` [Text is here](https://github.com/google-research/tuning_playbook/blob/890cd5387477a5ea1b3a7a40fe4ad1d077f54151/README.md?plain=1#L1860). Are there any...

hennels

Updated the parentheses in Nesterov equation

Fix the missing parentheses form the Nesterov equation

JatinKumar001

Translation of documents into simplified Chinese.

Hey, thanks for writing such a great document, we've translated it into simplified Chinese, hope it helps!

schrodingercatss

Question: Time Series Missing Data?

1

Hi everyone, I would like to know your opinion about seqtopoint deep learning models for forecasting and missing data. I am dealing with a time series that has a lot...

fdelcab

question about the performance of tuning

Hi, thanks for sharing this wonderful document. I have two questions: First, how would we know if the model already gets the best performance after trying different methods of tuning?...

hahahouomg

Gradient checkpointing

I believe the gradient checkpointing can be very useful if you have to maintain some minimum batch size, and you can't do that with your hardware. I was training a...

lakshya-4gp

tuning_playbook
tuning_playbook copied to clipboard

Metadata

`training throughput` may not equal to `time per step`

Fix typos of anchor links

Fix typo in README.md

All benefits of using a larger batch size assume the training throughput increases?

Evidence for Norm(x + f(x)) causing issues?

Updated the parentheses in Nesterov equation

Translation of documents into simplified Chinese.

Question: Time Series Missing Data?

question about the performance of tuning

Gradient checkpointing

← Metadata

Owner

Metadata

tuning_playbook tuning_playbook copied to clipboard

Metadata

← Metadata

Owner

Metadata

tuning_playbook
tuning_playbook copied to clipboard