moabb
moabb copied to clipboard
Data split for deep learning methods
trafficstars
Hello,
At the moment, split of the data into train and test sets is performed via the StratifiedKFold cross-validation, so that the distribution of the classes within the test and train is preserved. For deep learning, validation_split is 0.2, so 20 percent of the last training samples are selected for the validation (https://www.tensorflow.org/api_docs/python/tf/keras/Model). This means that in the train/test split chronology is neglected, while in the train/validation split it is not. In addition, for Cho2017 and PhisionetMI (at least) the distribution of the classes is not uniform over time, so there is a large mismatch between the training and validation subsets.