moabb icon indicating copy to clipboard operation
moabb copied to clipboard

WithinSubjectEvaluation() and WithinDatasetEvaluation()

Open toncho11 opened this issue 2 years ago • 8 comments

I think these evaluation methods are much needed. You do not have them now, right?

WithinSubjectEvaluation() - evaluates the performance on all sessions for the same subject WithinDatasetEvaluation() - it shuffles the data from all subjects (and sessions) and then it selects 1/5 for validation and the rest for training. Both training and validation will include data from all subjects. Results here will be more variable so it should be run several times as in cross validation.

toncho11 avatar Nov 04 '22 15:11 toncho11

By checking here

You have CrossSubjectEvaluation() that is similar to my proposed WithinDatasetEvaluation() in the sense that it trains on the entire dataset except on one subject, but definitely not the same.

I think CrossSessionEvaluation() is not the same as my proposed WithinSubjectEvaluation().

toncho11 avatar Nov 04 '22 15:11 toncho11

I don't really understand the differences yet, can you maybe clarify? It looks like CrossSubjectEvaluation() is doing what you want for WithinSubject -- train on all except K subjects, evaluate on K. Your WithinDatasetEvaluation sounds like pooling all the data regardless of session and subject and doing k-fold CV on it, is that right?

vinay-jayaram avatar Nov 04 '22 16:11 vinay-jayaram

Yes, for the WithinDatasetEvaluation. I want a model on the entire dataset, so that I could use it with new unseen subjects in the future.

For me WithinSessionEvaluation() calculates a score for each session, but I would like to calculate a score on merging all sessions of a single user and then calculating the score.

toncho11 avatar Nov 04 '22 17:11 toncho11

Hi, for now we concentrate on global benchmarking and transfer learning. Those are interesting suggestions for next steps.

sylvchev avatar Jan 02 '23 15:01 sylvchev

I have several interrogations about those evaluations:

  • WithinSubjectEvaluation mixes all session of a subject. This is not desirable, as there are important differences between sessions for the same subject. Mixing them in the training dataset will result in very optimistic results (could be considered as leaking).
  • WithinDatasetEvaluation is very close to CrossSubjectEvaluation. Mixing subjects in the training dataset will also results in too optimistic results as there are important differences between subjects. It could also be seen as leaking test data, when compared to realistic where not data of the user is available. To evaluate the impact of few data available for a subject, we have implemented learning curves to make systematic benchmark.

sylvchev avatar Jan 02 '23 16:01 sylvchev

What is the interest of gathering all subjects/sessions before splitting them in training/validation sets? As you indicated this will yield variable results, so you need to make multiple evaluation and average them. I think the variability will come mostly from the subject and the session.

sylvchev avatar Jan 03 '23 13:01 sylvchev

I have several interrogations about those evaluations:

  • WithinSubjectEvaluation mixes all session of a subject. This is not desirable, as there are important differences between sessions for the same subject. Mixing them in the training dataset will result in very optimistic results (could be considered as leaking).
  • WithinDatasetEvaluation is very close to CrossSubjectEvaluation. Mixing subjects in the training dataset will also results in too optimistic results as there are important differences between subjects. It could also be seen as leaking test data, when compared to realistic where not data of the user is available. To evaluate the impact of few data available for a subject, we have implemented learning curves to make systematic benchmark.

Thanks for @sylvchev's comments. I agree with you. But do you have some references to support your comments? Some papers may not care about this and result in a good performance.

dawin2015 avatar Feb 16 '24 03:02 dawin2015

I would like to do CrossSubjectEvaluation but using data from many datasets together.

datasets = [BNCI2014008(), BNCI2014009(),BNCI2015003()]

This makes 28 subjects (10 + 10 + 8). So each time I would like to test 1 using 27 (train). Currently I think this is not the case. The cross subject will be performed within each of the 3 datasets meaning 1 vs 7, 1 vs 7, 1 vs 9. Is this correct?

toncho11 avatar Mar 23 '24 06:03 toncho11