moabb icon indicating copy to clipboard operation
moabb copied to clipboard

Data split for deep learning methods

Open Sara04 opened this issue 1 year ago • 0 comments

Hello,

At the moment, split of the data into train and test sets is performed via the StratifiedKFold cross-validation, so that the distribution of the classes within the test and train is preserved. For deep learning, validation_split is 0.2, so 20 percent of the last training samples are selected for the validation (https://www.tensorflow.org/api_docs/python/tf/keras/Model). This means that in the train/test split chronology is neglected, while in the train/validation split it is not. In addition, for Cho2017 and PhisionetMI (at least) the distribution of the classes is not uniform over time, so there is a large mismatch between the training and validation subsets.

Sara04 avatar Aug 29 '23 20:08 Sara04