benchmark_rsfMRI_prediction Question about tangent-embedding & separation of training and test set

Question about tangent-embedding & separation of training and test set

Open WillemB2104 opened this issue 5 years ago • 4 comments

Hi, I just looked through your very interesting neuro-imaging paper and saw that you recommend using tangent-embeddings as a connectivity measure for classification. In this paper (Appendix A) you state that:

The computation is made of two step: First a group average covariance matrix is computed from the covariances of the training subjects: . Second, it is used to transform covariance matrices, in the train set or the test set.

I was curious to see how to implement this efficiently and was lucky enough to stumble upon this nice github repo. I had a look at your examples and noticed, for example in the ABIDE case, that regardless of the connectivity measure (e.g. tangent, partial or full corr) the entire dataset is used. Here is a link to the code.

Am I missing something, or is there really no split across training and test set here? Looking forward hearing from you and many thanks again for sharing your code!!

Jan 21 '20 15:01 WillemB2104

Thanks for your interest.

No, you are not missing anything. There is no split across training and test set. The tangent space parametrization is done over the entire samples. This is one way of doing it.

The other way is as described in the Appendix. For this, you should rely on, Nilearn class.

from nilearn.connectome import ConnectivityMeasure

Then, use fit on training set and then apply transform on both sets: train and test sets.

I hope this clarifies to you.

Jan 21 '20 22:01 KamalakerDadi

Thanks for the quick response and example.

I am wondering however whether these two ways to do tangent space parametrization (by either creating a group average covariance matrix on training samples only versus using the entire dataset) might affect classification performance. In any classification framework I try to keep training and test separate to avoid data leakage.

Have you by any chance tested whether these two different approaches change your results in your rsfMRI benchmark?

Jan 23 '20 10:01 WillemB2104

No. I haven't tested though I am interested. But I believe difference might be small.

Jan 23 '20 10:01 KamalakerDadi

Indeed, I think that this is a valid question, and I would love to get an empirical answer to it.

I must apologize for the difference between the appendix and the code: I am to blame, as I wrote the appendix having in mind that the biomarkers should be applicable to new subjects.

Jan 23 '20 13:01 GaelVaroquaux

benchmark_rsfMRI_prediction benchmark_rsfMRI_prediction copied to clipboard

Question about tangent-embedding & separation of training and test set

benchmark_rsfMRI_prediction
benchmark_rsfMRI_prediction copied to clipboard