DIGITS
DIGITS copied to clipboard
Shuffling data between epochs
Let me know if I misunderstand something or get something wrong while explaining this:
I know it's been pointed out that the default option for DIGITS is to shuffle the data upon dataset creation: https://github.com/NVIDIA/DIGITS/issues/29
And HDF5 (another database type that would allow dataset shuffling) is only available for classification tasks (not for segmentation or object detection (unless you use a custom python layer)): https://github.com/NVIDIA/DIGITS/issues/1548
And HDF5 was looked into, but not merged: https://github.com/NVIDIA/DIGITS/issues/224
But my question is: DIGITS uses LMDB to store the training data, and shuffles it when it creates the dataset.
But does DIGITS ever shuffle the training order of the data again or is every epoch always the same order of images?
As far as I understand, HDF5 would allow for image order shuffling between epochs, but because LMDB is compressed ordered database, it can only pull data in the same order every time.
By not shuffling the data between epochs we remove the IID properties of training, reducing the power of SGD, and making overfitting much easier.
What are you guys using to combat this problem? Is there a way to solve it?
Thanks for your time!