nltools
nltools copied to clipboard
stats.threshold usage is confusing
Thresholding method on Brain_Data objects vs the threshold function behave differently. @ljchang is this intended behavior? For example, a common workflow is to:
- Compute a stat map and an associate pmap (e.g. via permutation testing)
- Determine a threshold via mc correction (e.g. fdr)
- Mask the stat map using the the pmap that < threshold
However, threshold changes the shape of Brain_Data rather than simply masking out non-surviving voxels like the Brain_Data.threshold method. This make it impossible to append images to each other for storage or other use.
We should talk about this and simplify and consolidate. Just took a peak at this and one thing I'm concerned about with Brain_Data.threshold is the coerce_nan flag is set to True by default.
b.data = np.nan_to_num(b.data)
My guess is that this was added to deal with nans and to keep the shape the same, but this will be an issue depending on what the values of the map you are trying to threshold. This option assumes that values will be centered at zero, which will not always be the case.
stats.threshold looks like it uses apply_mask, which for some reason by default casts the mask into a nibabel instance and then uses nifti_masker to go back into the same space as the data to be masked. My guess is that this is to deal with use cases where mask is of a different shape than the data, which is fairly common. We might want to do a quick check if they are different shapes and only do this if they are different shapes, otherwise we will want to just do a straight boolean mask, which will be much faster.
@ejolly I'm not quite sure why it would be changing the shape of Brain_Data though. Any ideas?
Actually, this might be a bug where nifti_masker = NiftiMasker(mask_img=mask) should be nifti_masker = self.NiftiMasker(mask_img=mask). Actually, this whole code after this could be simplified as it should just be projecting mask into self space. @ejolly let me know if you agree and we can fix.
This is related to #310.
@ljchang I think it's changing the shape for exactly the reason you suggested. The threshold function makes a call to Brain_Data.apply_mask which always changes the shape of the output (as it's intended to). However, the Brain_Data.threshold method operates on the Brain_Data.data numpy array directly, always returning back the same shape.
I guess the broader decision is, that there are two different kinds of "masking" operations a user wants to perform and we don't cleanly differentiate between the two:
- Set all values in an existing
Brain_Dataobject to 0 unless they're inside of a mask. This is analogous to adjusting a threshold in fsleyes or xjview for spm. It's currently whatBrain_Data.thresholddoes, and I believe what a user wants if they make a call to thethresholdfunction directly (e.g. masking out t-stats based on p-values and a cutoff) - Subset a
Brain_Dataobject so you only get back the voxels inside of a provided mask, thereby changing the shape. This is currently whatBrain_Data.apply_maskdoes and is super useful and a bit unique to our tools because it makes working with ROI data directly very convenient
One possible delineation is this:
- For masking operations in the style of 1. above, we make the
Brain_Data.thresholda bit more flexible. If a user provides a file path or anotherBrain_Datainstance (e.g.t_brain.threshold(p_brain, thresh=.05)just makes a call to thethresholdfunction (like our stats methods) which performs the logic of creating a boolean mask and always returning back something of the same shape. If a user doesn't provide a file path or anotherBrain_Datainstance, (e.g. they want percentile or value-based thresholding), it does what it currently does and just sets all values that are outside the threshold to 0, still returning something of the same shape - For subsetting operations in the style of 2. above, we can use my suggestion in https://github.com/cosanlab/nltools/issues/309 and just perform straight numpy boolean indexing if the mask is of the same shape as the
Brain_Datainstance, otherwise proceed with theNiftiMaskerlogic in the existing method code, thereby always changing the shape of the return object to be the same size as that of the mask.
How does something like that sound?
Ugh getting burned by this again. We should really standardize behavior across stats.threshold and Brain_Data.threshold. Here's an example function that doesn't change the shape of the output and behaves likeBrain_Data.threshold:
def thr_brain_keep_size(stat, p, thr=0.05):
"""Just like threshold from nltools.stats but keeps the data shape"""
from copy import deepcopy
out = deepcopy(stat)
out.data[p.data > thr] = 0
return out
@ljchang Few issues with students at MIND encountering this again. I'd strongly advise to replace nltools.stats.threshold with the the function above.
Is there a situation I'm missing where we want Brain_Data.shape to change after using threshold(Brain_Data)?