deep-rules icon indicating copy to clipboard operation
deep-rules copied to clipboard

Strive for balance between training, tuning, and testing datasets

Open hugoaerts opened this issue 5 years ago • 1 comments

Scientists should strive for balance in the training, tuning, and testing datasets to assure that different phenotypic groups are represented appropriately/similarly. This refers to the balanced proportion of different classes of outcome or target variables. If class imbalance is inevitable, appropriate strategies like augmentation or bootstrapping can be used.

hugoaerts avatar Nov 07 '18 21:11 hugoaerts

With augmentation, one needs to be careful to augment the training, testing, and validation data separately, not together (and then splitting) to prevent overfitting.

Benjamin-Lee avatar Nov 13 '18 00:11 Benjamin-Lee