tuning_playbook
tuning_playbook copied to clipboard
`training throughput` may not equal to `time per step`
One of the sections mentions that training throughput
is equivalent to time per step
. There is a doubt here. Suppose there are two kinds of batch size
: 64
and 128
, then training throughput
does not have the same value when time per step
is both 1. And obviously, training throughput
is a better reflection of batch size
.
They are equivalent by definition:
training throughput = (# examples processed per second)
The right hand side is equal to batch size / time per step
. Rearranging this equation gives:
time per step = (batch size) / (training throughput)
I think the confusion is that "equivalent" does not mean "equal" in this context. Rather, given a particular batch size
, knowing the throughput is equivalent information to knowing the step time, in the sense that knowing one allows you to compute the other.
Hey @SimLif, you're right that your language might be more precise here. In general looking at training throughput is one of the more useful metrics when choosing a batch size.