classifier-reborn Implement stratified k-fold cross-validation

Implement stratified k-fold cross-validation

Open ibnesayeed opened this issue 8 years ago • 2 comments

Current k-fold cross-validation assumes that the supplied sample data is uniformly randomized, hence, performs simple slicing of the array for individual folds. We should partition the data in a way that the proportion of various classes are maintained in each fold. This can be the default or the only option or partition or alternatively an optional boolean parameter can be provided for stratification.

Feb 15 '17 17:02 ibnesayeed

I'm open to this, but wouldn't know how to do it.

Feb 22 '17 16:02 Ch4s3

To enforce this, we will have to first prepare buckets of each class from the supplied sample set and then partition each subset into k equal parts. Finally, pick one chunk from each subset to make data for each of the k sets. It is not difficult to do. I can take care of it when I get a chance to play with the code again. However, for now we are shuffling the sample data before splitting, which would theoretically have the similar effect, except not very precise, depending on the randomness.

Feb 27 '17 23:02 ibnesayeed

classifier-reborn classifier-reborn copied to clipboard

Implement stratified k-fold cross-validation

classifier-reborn
classifier-reborn copied to clipboard