Blitz tutorial takes too long to run (GD instead of SGD?)
Hi there! In the 60-minute blitz tutorial (https://fluxml.ai/tutorials/2020/09/15/deep-learning-flux.html), the part where we train a network on CIFAR10 takes longer than expected. Could it be because we actually go through every minibatch in each epoch, instead of sampling only one? I am specifically referring to this line https://github.com/FluxML/model-zoo/blob/52a7b8923ef7f0313b6e38765536166ae1ef7961/tutorials/60-minute-blitz/60-minute-blitz.jl#L366. Because of it, I feel like we are actually doing a non-stochastic gradient descent, which would explain the large runtime.
train is already a vector of batches (see here), so iterating it in a for-loop will do mini-batch SGD.
But our mini-batches appear to not be so mini...the batch size is 1000! Fixing that should help.
My bad, I was wrong on the meaning of a training epoch! Thanks for your answer, I will try reducing batch size