brainstorm3 icon indicating copy to clipboard operation
brainstorm3 copied to clipboard

Group analysis: recommended workflow and statistics

Open ftadel opened this issue 7 years ago • 5 comments

The pipeline currently described on the website is not giving acceptable results, and we could not publish any statistics at the group level in the recent Frontiers article. This needs to be addressed urgently.

Two online tutorials to fix: https://neuroimage.usc.edu/brainstorm/Tutorials/Workflows#Constrained_cortical_sources https://neuroimage.usc.edu/brainstorm/Tutorials/VisualGroup#Group_analysis:_Sources

Unconstrained sources must be addressed as well. We are now recommending most EEG users to use unconstrained dipole orientations, but we have no option at all to offer for group analysis.

ftadel avatar Nov 29 '18 11:11 ftadel

Last messages exchanged on this topic.

Dimitrios:

Attached slides: Frontiers dataset - Dimitrios Pantazis.pptx

A summary of some findings: -The |Faces| - |Scrambled| contrast works well for sources -Applying z-score after low-pass filtering leads to noisy maps. It should be applied before. -I think dSPM maps produce better results that min-norm (more sensitive). (Results should be the same if we apply min-norm + z-score, though I did not test) -We may need a better colormap than the red-blue-white -I reproduced the MNE paper results reasonably well but with FDR thresholds -Cluster size inference failed to produce good clusters (unlike the MNE paper). I used the fieldtrip cluster-size test implemented in brainstorm. It took over 10 hours and I don’t know why the results don’t match the MNE paper ones. -Critically, the |Faces – Scrambled| contrast is good for sensors, but does not appear to be good for source maps. In particular, the maps appear ok but the Chi-squared statistic fails miserably and I could not find a way to make it work (everything is significant). I think we seriously need to consider removing it from our recommendations. Which is unfortunate because I suggested it before. It was the only option I could think of to perform statistics on a single condition case (when there is nothing to permute). This means we do not have a statistical procedure for a single condition, but the Chi-squared does not work anyway. In a single condition case, researchers should use condition vs. baseline (thus making it a 2-condition test) and use |condition| - |baseline| for sources.

Sylvain:

-The |Faces| - |Scrambled| contrast works well for sources

Qualitatively, the maps you produced with dSPM or MNE 2-condition comparisons are consistent with one another: there is no major unexpected difference between them, with the peaks of activity similar regions. The only differences are in the threshold cut. This is good.

-Applying z-score after low-pass filtering leads to noisy maps. It should be applied before.

I think this is because LPF reduces the signal variance tremendously, hence an over estimation of the z-scores. The zscore map you show on slide#2 would probably require a higher cutoff.

-I think dSPM maps produce better results that min-norm (more sensitive). (Results should be the same if we apply min-norm + z-score, though I did not test)

Would be good to test. I would say dSPM is more specific rather (less spurious activity). Is that what you meant?

-We may need a better colormap than the red-blue-white

Agreed. I like the jin one you have tried.

-I reproduced the MNE paper results reasonably well but with FDR thresholds

Agreed.

-Cluster size inference failed to produce good clusters (unlike the MNE paper). I used the fieldtrip cluster-size test implemented in brainstorm. It took over 10 hours and I don’t know why the results don’t match the MNE paper ones.

This is something we need to understand.

-Critically, the |Faces – Scrambled| contrast is good for sensors, but does not appear to be good for source maps. In particular, the maps appear ok but the Chi-squared statistic fails miserably and I could not find a way to make it work (everything is significant). I think we seriously need to consider removing it from our recommendations. Which is unfortunate because I suggested it before. It was the only option I could think of to perform statistics on a single condition case (when there is nothing to permute). This means we do not have a statistical procedure for a single condition, but the Chi-squared does not work anyway. In a single condition case, researchers should use condition vs. baseline (thus making it a 2-condition test) and use |condition| - |baseline| for sources.

I don’t understand the logic of applying chi-squared here because it is a 2-condition study design on the first place. What do you see if you apply chi2 stats on abs(Faces) only for instance?
Another thing I don’t understand is why you z-scored the abs(dSPM(Faces-Scrambled))) before applying chi2. I thought you said that zscoring abs maps is not recommended/wrong.

Dimitrios:

“Would be good to test. I would say dSPM is more specific rather (less spurious activity). Is that what you meant?”

For the min-norm maps, there are 3 cases:

  1. no z-score is applied to the min-norm maps. In that case I found less detected activity than for dSPM maps (thus less sensitive; this is what I meant in the above sentence).
  2. z-score is applied before low-pass filtering. In that case I expect results similar to dSPM
  3. z-score is applied after low-pass filtering, in which case we saw overestimated z-values and non-specific significance maps.

“I don’t understand the logic of applying chi-squared here because it is a 2-condition study design on the first place. What do you see if you apply chi2 stats on abs(Faces) only for instance? “

Also, regarding the chi-squared test logic: In this study we have paired measurements, i.e. every participant was measured for the faces and scrambled conditions. Thus we can construct the difference: faces – scrambled and this becomes equivalent to a one-sample test across participants. Therefore, in that sense the test becomes equivalent to a one condition experiment.

[The underlying problem is that in sensor space we can work with positive and negative values, so it is meaningful to do a t-test against 0 ( or permute +1/-1). But in source space we need to apply ‘abs’, otherwise the cortical maps do not align across subjects and the maps are really noisy. So we have |faces – scrambled| values and there is no straightforward test for positive values.]

“Another thing I don’t understand is why you z-scored the abs(dSPM(Faces-Scrambled))) before applying chi2. I thought you said that zscoring abs maps is not recommended/wrong.”

The idea behind chi-squared test was that the summation of squared Gaussian variables is chi-squared distributed. So we need to make sure each subject has ~Gaussian cortical maps. This is the reason why I wanted to apply the z-score at the very end, just before the chi-squared test. But even that does not work.

The test is mostly unachievable for the reason that we need to apply ‘abs’ and ‘spatially smooth’ to the cortical maps as well. My thought was that zscore(spatiallysmooth(abs(cortical maps))) would be at least approximately Gaussian. Any other sequence, such as spatiallysmooth(abs(zscore))), produces even less Gaussian distributions.

In general I think the Chi-squared test is probably a bad idea I should not have suggested it to begin with :(. There is a more fundamental reason against it. In retrospect, this is the kind of test that even if for some subjects Faces>Scrambled and for others Faces<Scrambled, combined for all subjects the |Faces-Scrambled| would be significant. Consequently, I am worried that even small noise variations contribute to inflated values without a real effect.

[This somewhat resembles MEG classification, because in decoding everything becomes positive as well. It was attacked recently as a fixed-effects and not random-effects analysis precisely because decoding can give significance if for some subjects faces>scrambled and for others faces<scrambled. https://www.sciencedirect.com/science/article/pii/S1053811916303470 (but I don’t agree entirely with this paper and their prevalence inference arguments)

At least for decoding we have proper statistical procedures, whereas the Chi-squared test seems very approximate.

My feeling is that we should be encouraging contrasts like Faces – Scrambled in the sensor domain, and |Faces| - |Scrambled| in the source domain. This may seem awkward at first, but it is mostly common practice. And take away the chi-squared test (sorry Francois).

ftadel avatar Nov 29 '18 11:11 ftadel

@richardmleahy @HosseinShahabi Weren't you working on this? Statistics are still the major black hole of the software...

ftadel avatar Jun 11 '19 08:06 ftadel

Ping

ftadel avatar Nov 09 '20 08:11 ftadel

Still some major work needed on wrapping up the recommended pipeline for unconstrained sources...

ftadel avatar Sep 19 '21 07:09 ftadel

Forum question: https://neuroimage.usc.edu/forums/t/unconstrained-to-flat-map-with-pca-method-details-on-the-process/36054

ftadel avatar Jul 19 '22 14:07 ftadel