Arraymancer icon indicating copy to clipboard operation
Arraymancer copied to clipboard

Shuffling, k-fold and stratified k-fold

Open mratsim opened this issue 5 years ago • 0 comments

Shuffle

Deterministic shuffles are needed in general for both deep learning and machine learning.

In many cases input data might be ordered, for example IMDB is all positive reviews, then all negative reviews. This will make large gradient update when passing from one section to another and skew the final weights to negative while shuffle data will reach a better balance.

K-Fold and Stratified K-Fold

Controlling folds is key to reach the highest accuracy and make sure we don't contaminate our model with target leaks when building complex ensembles.

mratsim avatar Dec 08 '18 21:12 mratsim