`training throughput` may not equal to `time per step`
One of the sections mentions that training throughput is equivalent to time per step. There is a doubt here. Suppose there are two kinds of batch size: 64 and 128, then training throughput does not have the same value when time per step is both 1. And obviously, training throughput is a better reflection of batch size.
They are equivalent by definition:
training throughput = (# examples processed per second)
The right hand side is equal to batch size / time per step. Rearranging this equation gives:
time per step = (batch size) / (training throughput)
I think the confusion is that "equivalent" does not mean "equal" in this context. Rather, given a particular batch size, knowing the throughput is equivalent information to knowing the step time, in the sense that knowing one allows you to compute the other.
Hey @SimLif, you're right that your language might be more precise here. In general looking at training throughput is one of the more useful metrics when choosing a batch size.