mne-python icon indicating copy to clipboard operation
mne-python copied to clipboard

[ENH] Spatio-temporal cluster permutation testing: Define minimum "nodes" to define a cluster

Open sappelhoff opened this issue 2 years ago • 29 comments

For spatio-temporal cluster permutation testing, it's important to consider how many "nodes" make up a cluster.

For example:

  • In the time domain (considering 1 "channel"): Are two neighboring significant time points already a cluster? Or does it start with "3", "4", "N"?
  • In the spatial domain: the same but for neighboring channels who have are significant at the same point in time

Currently I can only assume that in MNE-Python, any neighboring relationship (2 timepoints, or 2 channels) makes up a cluster.

It would be really nice to

  1. document this
  2. offer a parameter to customize this

In FieldTrip, this is done as so:

cfg.minnbchan        = 2;          % minimum number of neighborhood channels that is
                                   % required for a selected sample to be included
                                   % in the clustering algorithm (default=0).

https://www.fieldtriptoolbox.org/tutorial/cluster_permutation_timelock/

xref https://github.com/mne-tools/mne-python/issues/4859

sappelhoff avatar May 06 '22 11:05 sappelhoff

Currently I can only assume that in MNE-Python, any neighboring relationship (2 timepoints, or 2 channels) makes up a cluster.

Actually a single time point and single spatial location will even be a "cluster". The only thing that matters is that it exceeds the threshold.

Do you need to set minimum spatio/temporal/spatiotemporal "extent"s to get reasonable results? For data that are sufficiently smooth I don't think it shouldn't matter much in practice because single-node "clusters" will have very small counts and stat values (which you get/use depends on t_power) so won't be the largest cluster(s) for each permutation used in the end for the maximal statistic.

larsoner avatar May 06 '22 11:05 larsoner

I think you are right with the assessment that "it won't matter" in most cases, and I think for my data presently it also doesn't matter too much.

However, I still remember this parameter from when I was using FieldTrip and would have liked to use it during exploration, as it also controls the number of "observed" clusters.

will have very small counts and stat values (which you get/use depends on t_power)

Now that we are talking about t_power: Is there actually a meaningful value for that param beyond 0 (for count of "extent") and 1 (for sum of cluster stats)?

sappelhoff avatar May 06 '22 11:05 sappelhoff

it also controls the number of "observed" clusters.

This is a pretty easy operation (almost/possibly a one-liner) to do on the observed clusters after the fact, though. So I'd rather not add a parameter for this, but instead maybe a line in a tutorial/example.

Now that we are talking about t_power: Is there actually a meaningful value for that param beyond 0 (for count of "extent") and 1 (for sum of cluster stats)?

I guess that continuous values between 0 and 1 give a corresponding weight to the count vs stat. Then values greater than 1 give more importance to larger stat values, which at least reminds me of the difference between minimizing squared error versus absolute error / L2 vs L1 norms. But I'd suggest looking at the original papers/refs to see what they say, this is just my intuition from thinking about it.

larsoner avatar May 06 '22 11:05 larsoner

This is a pretty easy operation (almost/possibly a one-liner) to do on the observed clusters after the fact, though. So I'd rather not add a parameter for this, but instead maybe a line in a tutorial/example.

Fair enough! But I also just thought of another way why such a parameter would be relevant -- I drew a quick sketch (you need to tilt your head to the right should, sorry :wink:)

image

In the drawn scenario, this is 1 single big cluster that has a much larger extent simply because there are a few channel/time "bins" that extend like "fingers" from the "main cluster" ... this even has the potential to connect clusters that would actually be distinct.

With the proposed parameter, such "fingers" can be removed

I guess that continuous values between 0 and 1 give a corresponding weight to the count vs stat. Then values greater than 1 give more importance to larger stat values, which at least reminds me of the difference between minimizing squared error versus absolute error / L2 vs L1 norms. But I'd suggest looking at the original papers/refs to see what they say, this is just my intuition from thinking about it.

thanks for this intuition!

sappelhoff avatar May 06 '22 12:05 sappelhoff

I've run into this problem with a cluster permutation in the time domain, where like @larsoner said a single time point can become a cluster. I now simply filter those out manually according to a user-defined threshold.

hoechenberger avatar May 06 '22 14:05 hoechenberger

I've run into this problem with a cluster permutation in the time domain, where like @larsoner said a single time point can become a cluster. I now simply filter those out manually according to a user-defined threshold.

I'll make a PR for the study template soon that demonstrates this.

hoechenberger avatar May 06 '22 14:05 hoechenberger

In the drawn scenario, this is 1 single big cluster that has a much larger extent simply because there are a few channel/time "bins" that extend like "fingers" from the "main cluster" ... this even has the potential to connect clusters that would actually be distinct.

This is not the case for your example according to a standard lattice adjacency, there are four clusters:

>>> import numpy as np
>>> x = np.zeros((6, 5), bool)
>>> x[[0, 1, 2, 3, 3, 4, 4, 5], [0, 1, 4, 2, 3, 2, 3, 2]] = True
>>> 
>>> x
array([[ True, False, False, False, False],
       [False,  True, False, False, False],
       [False, False, False, False,  True],
       [False, False,  True,  True, False],
       [False, False,  True,  True, False],
       [False, False,  True, False, False]])
>>> from scipy import ndimage
>>> ndimage.label(x)
(array([[1, 0, 0, 0, 0],
       [0, 2, 0, 0, 0],
       [0, 0, 0, 0, 3],
       [0, 0, 4, 4, 0],
       [0, 0, 4, 4, 0],
       [0, 0, 4, 0, 0]], dtype=int32), 4)

This is because lattice adjacency (reasonably IMO) is a "cross" not a "filled 3x3 square" in the 2D case.

larsoner avatar May 06 '22 14:05 larsoner

This is not the case for your example according to a standard lattice adjacency, there are four clusters:

I see - but does that make my point invalid? :slightly_smiling_face:

Because we could tune my example such that it would result in a single cluster, ... and still look like something that I'd rather control to not be a single cluster (by a parameter that tunes a "minimum extent").

Full disclosure: I won't have time to do this PR myself at this time. But I still don't think (yet?) that this issue needs to be closed.

sappelhoff avatar May 06 '22 14:05 sappelhoff

Because we could tune my example such that it would result in a single cluster, ... and still look like something that I'd rather control to not be a single cluster (by a parameter that tunes a "minimum extent").

Okay I'm not sure what you mean / how you want this "min" operation to operate. Can you provide an example of current behavior that would be bad, what you'd want the parameter to fix? For example, based on how I've been thinking about the problem, I would expect a min_spatial=2 or so to produce a cluster with 4 nodes for the thresholded map (time in rows, space in cols):

[[0, 1],
 [0, 1],
 [1, 1]]

But I'm starting to suspect you're saying that you'd want this to produce a cluster with just the last row, which is indeed different from how I've been thinking about the minimum extent... I'm thinking about a minimum after forming the cluster like an "any" operation, and you're thinking about a minimum during clustering like an "all" operation (i.e., "it has to maintain a minimum extent of X along dimension Y at all locations, rather than achieve it at any point")...?

Which way does the FieldTrip parameter actually operate?

larsoner avatar May 06 '22 14:05 larsoner

The fieldtrip parameter description says:

cfg.minnbchan        = 2;          % minimum number of neighborhood channels that is
                                   % required for a selected sample to be included
                                   % in the clustering algorithm (default=0).

So I think the default is identical to what we currently have in MNE.

with cfg.minnbchan=1 I would expect that for:

[[0, 1],
 [0, 1],
 [1, 1]]

... the first column would contain just a single channel significant at this timepoint 0, i.e., there are 0 neighbors -- so this does not go into the cluster, and -- as you say -- only the last column will be part of the cluster.

I think this can be helpful if one wants to avoid cases like this (just conceptually):

[[1, 0, 0, 1, 1],
 [1, 0, 0, 1, 1],
 [1, 1, 1, 1, 1]]

where 2 (or more) clusters are "bridged" by small connections (i.e., channels that are more sustainedly significant, when the rest of the cluster is already n.s. again).

sappelhoff avatar May 06 '22 14:05 sappelhoff

Okay that makes sense. It seems doable in principle given our existing code. I think we can achieve minimums for:

  1. Time points spanned by a single spatial node by making a preliminary pass over the columns (time series) of thresholded array to just zero out any contiguous series of 1's that is not long enough.
  2. Spatial nodes by working within the clustering step to just drop any "cluster" within a time slice that is not big enough.

Just to clarify, what should happen in this case for min_t_span=2, min_spatial_span=2?

[[0, 1, 0],
 [1, 1, 1],
 [0, 1, 0]]

Should there even be 1 cluster? I think no, since the singletons should be culled by the algorithm...

As far as the naming goes, we could consider min_t_span=1(default) rather than min_t_neighbors=0. I at least find thinking about the span more natural than the number of neighbors (even though they just differ by 1).

larsoner avatar May 06 '22 15:05 larsoner

Should there even be 1 cluster? I think no, since the singletons should be culled by the algorithm...

agreed

As far as the naming goes, we could consider min_t_span=1(default) rather than min_t_neighbors=0. I at least find thinking about the span more natural than the number of neighbors (even though they just differ by 1).

I think that most will agree with this as well :+1:

sappelhoff avatar May 06 '22 15:05 sappelhoff

Okay then we just need a volunteer :)

larsoner avatar May 06 '22 15:05 larsoner

I agree that it is useful to have such parameters when doing cluster based permutation test. minnbchan can be very useful in avoiding birdges between clusters (ie two clusters merged into one because there is a relatively narrow path joining them).

I actually have some code for things like that in my borsar repository (which unfortunatelly, despite my best intentions, does not contain much good documentation apart from docstrings; but has good test coverage at least :) ). Unfortunatelly it does clustering differently to mne - uses separate functions (both numba and numpy) specialized for common scenarios (like 3d space with channel adjacency (TFR), 2d space with adjacency (ERP) etc.) - so making this code part of mne might not be straightforward. If someone would like to try it out, here is a short example of finding clusters using minimum number of adjacent channels (min_adj_ch below):

import numpy as np
import borsar


data = np.array([[1, 0, 0, 1, 1],
                 [1, 0, 0, 1, 1],
                 [1, 1, 1, 1, 1]])
adjacency = np.array([[False, True,  False],
                      [True,  False, True],
                      [False, True,  False]])

# %% find clusters
clusters, cluster_stats = borsar.cluster.find_clusters(
    data.copy(), threshold=0.5, adjacency=adjacency, min_adj_ch=1)

this returns two boolean 2d arrays in clusters because there are two clusters when eliminating samples without at least one adjacent supra-threshold sample in the channels dimension. There are also higher level functions for performing the whole cluster based permutation test.

mmagnuski avatar May 07 '22 21:05 mmagnuski

@mmagnuski this is some really cool stuff you have there in this repo! Don't you think we should take some of those ideas and add them to MNE?

hoechenberger avatar May 07 '22 21:05 hoechenberger

@hoechenberger Thanks :) Yes, I would be happy to add some of this to MNE. I can try adding a few examples to the docs, so you can let me know what would be best / most worthwile to try adapting to mne and moving over.

I actually had some issues with building the docs (I was doing it manually, so the online docs are at least 2 years behind, ugh), so if someone could give me a hint on how best to:

mmagnuski avatar May 08 '22 10:05 mmagnuski

Oh I'm not thinking about filling the docs up with yet another set of (too) elaborate examples. I was thinking about getting stuff like your Clusters class to MNE. I'd be happy to do some brainstorming with you there! The biggest problem I currently see with permutation tests in MNE is that the public API is super low-level and this makes it very difficult for users to actually work with this stuff. We need to identify the most common use cases and come up with a higher-level API that will make users' lives easier.

hoechenberger avatar May 08 '22 10:05 hoechenberger

@hoechenberger Yes, that's exactly what I was thinking, but I thought it may be useful to showcase some of the things that can be done with Clusters for example to decide what to bring to mne. There are some more advanced features of that object that are quite useful (at least to me), but either may not fit with the style of mne (for example clst.plot(time=(0.2, 0.3)) gives topo for average 0.2 - 0.3 time range but clst.plot(time=[0.2, 0.3]) gives two topos - one at 0.2 and another at 0.3 seconds) or cry for a refactoring.

mmagnuski avatar May 08 '22 12:05 mmagnuski

I just got a (maybe naive) question: When I supply 3D data (channels x times x frequencies) ... is the clustering also happening over frequencies? E.g., a cluster because at time X, the adjacent freq bins 8Hz and 9Hz were both significant at channel Y?

sappelhoff avatar May 09 '22 15:05 sappelhoff

@sappelhoff Yes. Visualisation of such 3d clusters can sometimes be problematic - that's why my Clusters object has various more advanced options that may seem unnecessary or too much magic. :) I'll try to prepare some examples later this week and link in a separate thread so you can all decide what would be worthwhile to bring to mne.

mmagnuski avatar May 09 '22 15:05 mmagnuski

Yes.

thanks! Regarding this I also just found this example, where on top of the sensor adjacency (via mne.channels.find_ch_adjacency), a "lattice" (whatever that is) adjacency is added (via mne.stats.combine_adjacency) for the frequencies.

So I have two questions:

  1. if I don't use "combine_adjacency" with the sensor adjacency and then do a perm cluster test over freqs, will the clustering over frequencies just not happen?
  2. what's a lattice (yes, I could google that), and should we improve the docs on that? Or is that just a "me" problem?

If that answer to 1. is "yes", we also need to improve the docs IMHO

sappelhoff avatar May 17 '22 11:05 sappelhoff

@sappelhoff Lattice adjacency is just a typical adjacency where element of index N is adjacent to elements with N - 1 and N + 1 indices. So this is what you would expect for a dimensions like time or frequency.

With respect to your questions:

if I don't use "combine_adjacency" with the sensor adjacency and then do a perm cluster test over freqs, will the clustering over frequencies just not happen?

I'm not sure what will happen then, but if you are clustering in channels, time and space in mne then you should use combine_adjacency (for 2d data you can just use respective clustering mne function, I think). If in such scenario you pass only the channel adjacency (without combining with information for other dimensions) then an error should be raised, I think, because you will get weird results. In such 3d scenarios I recently used almost exclusively the code I have in borsar / sarna, so I may not remember default mne behaviour very well.

what's a lattice (yes, I could google that), and should we improve the docs on that? Or is that just a "me" problem?

Improving the docs is almost always a good idea if something is not clear. :) In this case -definitely, we should make it clear what we mean by lattice and that combine_adjacency is necesarry in 3d scenarios such as channels x time x frequency.

mmagnuski avatar May 17 '22 12:05 mmagnuski

I'm not sure what will happen then, but if you are clustering in channels, time and space in mne then you should use combine_adjacency

Interesting! Looking at this example of 3d data (observations X time X chs), only sensor adjacency is defined (without doing a "combine adjacency" to add "time adjacencies" as a lattice), and still it looks like clustering is happening over channels AND time (as I would expect!)

And that example is using spatio_temporal_cluster_test instead of permutation_cluster_test ... or is that what you are referring to with

for 2d data you can just use respective clustering mne function, I think

?

(side note: I think it's highly confusing to have these two functions in our public API, where one of them is a "wrapper" to make the other one "simpler")

I also think that choosing "sparse matrix" as the user facing format for adjacency is not optimal. I don't really know what to make out of it. I could understand a list of matrices with length == ndimensions in the test, and where each item refers to one dimension. For example for channels it'd be a nchs X nchs matrix of zeros and ones, marking the connections.

Improving the docs is almost always a good idea if something is not clear. :) In this case -definitely, we should make it clear what we mean by lattice and that combine_adjacency is necesarry in 3d scenarios such as channels x time x frequency.

okay, cool that we are on the same page here. I plan to do a small PR soon where I improve the docs on all aspects that I think I understood now (won't be much, but better than nothing).

sappelhoff avatar May 17 '22 13:05 sappelhoff

By 3d I meant three dimenaions for each observation. Yes, spatio_temporal_cluster_test is the function I meant for 2d data with channel adjacency (ERPs, spectra etc.). I agree that having multiple different clustering functions is confusing. And I also agree that returning sparse matrix is not great (for channel space I always use .toarray()). I think it is sparse mostly for source space efficiency (thousands of points, but each neighbouring only a few of them).

mmagnuski avatar May 17 '22 16:05 mmagnuski

I second that those sparse matrices are kinda odd to me ad a user

hoechenberger avatar May 17 '22 19:05 hoechenberger

Interesting! Looking at this example of 3d data (observations X time X chs), only sensor adjacency is defined (without doing a "combine adjacency" to add "time adjacencies" as a lattice), and still it looks like clustering is happening over channels AND time (as I would expect!)

indeed the spatio_temporal_cluster_(1samp_)test functions automatically add a lattice connectivity along the time dimension based on the max_step parameter, as long as the value passed for adjacency is the correct shape (I think it's using combine_adjacency under the hood).

drammock avatar May 20 '22 18:05 drammock

Hello all, so what's the plan here on how to proceed? I believe tackling this will be a major and widely applauded improvement. Maybe we could create some sort of milestone/task/todo list and distribute tasks? :)

hoechenberger avatar Jun 11 '22 21:06 hoechenberger

To me this sounds like one PR that adds this feature, as Eric described in https://github.com/mne-tools/mne-python/issues/10604#issuecomment-1119722416 (although that comment only describes time and space, and we should also implement this for frequencies)

sappelhoff avatar Jun 12 '22 21:06 sappelhoff

I would be interested in working on this during the code sprint.

kimcoco avatar Jun 27 '22 06:06 kimcoco