moabb
moabb copied to clipboard
Data split for deep learning methods
Hello,
At the moment, split of the data into train and test sets is performed via the StratifiedKFold cross-validation, so that the distribution of the classes within the test and train is preserved. For deep learning, validation_split is 0.2, so 20 percent of the last training samples are selected for the validation (https://www.tensorflow.org/api_docs/python/tf/keras/Model). This means that in the train/test split chronology is neglected, while in the train/validation split it is not. In addition, for Cho2017 and PhisionetMI (at least) the distribution of the classes is not uniform over time, so there is a large mismatch between the training and validation subsets.