nibabel icon indicating copy to clipboard operation
nibabel copied to clipboard

ENH: Add option to infer CIFTI-2 intent codes

Open mgxd opened this issue 5 years ago • 19 comments

Adds support for FileBasedImage methods to_filename and instance_to_filename to support keyword args.

Adds keyword arg validate to CIFTI's to_filename and instance_to_filename methods.

  • If False (default), the intent code will not be set (except when the intent code is "none")
  • If True, the first suffix of the output filename will be compared to valid suffixes from within the CIFTI-2 specification. If there is a match, the corresponding intent code will be set, and each IndexMap within the header matrix is checked.

mgxd avatar Jul 06 '20 13:07 mgxd

Codecov Report

Merging #932 into master will decrease coverage by 0.06%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #932      +/-   ##
==========================================
- Coverage   91.91%   91.84%   -0.07%     
==========================================
  Files          99       99              
  Lines       12514    12536      +22     
  Branches     2204     2208       +4     
==========================================
+ Hits        11502    11514      +12     
- Misses        678      684       +6     
- Partials      334      338       +4     
Impacted Files Coverage Δ
nibabel/cifti2/cifti2.py 96.81% <100.00%> (+0.11%) :arrow_up:
nibabel/filebasedimages.py 85.80% <100.00%> (ø)
nibabel/loadsave.py 92.59% <100.00%> (ø)
nibabel/environment.py 75.00% <0.00%> (-20.00%) :arrow_down:
nibabel/casting.py 85.47% <0.00%> (-0.86%) :arrow_down:
nibabel/dft.py 80.36% <0.00%> (-0.62%) :arrow_down:
nibabel/nifti1.py 92.26% <0.00%> (-0.31%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 917afab...af7265c. Read the comment docs.

codecov[bot] avatar Jul 06 '20 13:07 codecov[bot]

@effigies Totally forgot about this PR - I liked your suggestion in https://github.com/nipy/nibabel/pull/932#discussion_r468995046 so I went ahead and added that. Since that now includes IndexMap information, I reworked the original infer_intent kwarg into validate, which checks for the presence of the expected IndexMaps (and also the correct order) on top of setting the intent code.

mgxd avatar Oct 02 '20 20:10 mgxd

I believe the failing tests are unrelated - this is ready for review.

mgxd avatar Oct 05 '20 17:10 mgxd

Yup, opened https://github.com/pydicom/pydicom/issues/1214 for those failures.

effigies avatar Oct 05 '20 18:10 effigies

Overall, I'm not sure I like this UX. If I try to save a generic_cifti.nii without validate=False, this is going to give me a KeyError.

My inclination is that we want warnings, not errors, by default. And inferring intent codes and validating axes feel like they shouldn't be controlled by the same parameter. @MichielCottaar @satra Would be interested in your opinions (or anyone else who uses CIFTI a lot) on how we can be helpful without being burdensome.

I haven't checked this, but I suspect that the logic will fail if someone tries to save a compressed .nii.gz, which we've previously allowed, even though CIFTI strongly discourages it.

effigies avatar Oct 16 '20 17:10 effigies

Thanks for looking into these intent codes and filename extensions. I would suggest to turn around the logic to use the map types as a ground truth of what the user intends rather than the file extension. At least for the current CIFTI file types, it is possible to determine what the intent code should be for any set of map types. I think such an inference could just added in without most users noticing (ideally it would not overwrite any intent code explicitly set by the user).

I do like the idea of also checking that the filename extension matches what is being stored. However, I agree with effigies that it is probably better to keep this as a warning.

MichielCottaar avatar Oct 17 '20 20:10 MichielCottaar

I think using the axis types as a more authoritative indicator of intent than the extension makes a lot of sense. So something like:

try:
    expected_intent = CIFTI_CODES.niistring[axis_types]
    expected_ext = CIFTI_CODES.extension[axis_types]
except KeyError:
    expected_intent = "NIFTI_INTENT_CONNECTIVITY_UNKNOWN"
    expected_ext = ".nii"

if validate and expected_intent != found_intent:
    warn
    set intent

if validate and not fname.endswith(expected_ext):
    warn

This logic is simplified, since axis type tuples are not unique determinants. (NIFTI_INTENT_CONNECTIVITY_DENSE_SERIES applies to dtseries.nii and dfan.nii, (CIFTI_INDEX_TYPE_SCALARS, CIFTI_INDEX_TYPE_BRAIN_MODELS) applies to dscalar.nii and dfan.nii, and dfibersamp.nii and dfansamp.nii share both intents and axes.)

This would also treat .nii.gz the same as a mismatched .<subext>.nii, and warn, but not fail, which feels a bit kinder to users.

With this mode, I think I'm okay with controlling both behaviors with a single parameter. validate seems fine to me...

effigies avatar Oct 18 '20 02:10 effigies

@effigies - your plan looks reasonable. the one thing perhaps to think a bit more about is exception vs warning. warnings in python are mostly useless, i think, in dataflows and large scale processing, which is where such mismatches are likely to show up. the question is whether a reasonable assumption is being made by the tool when processing when ignoring a warning (since warnings are never trapped by downstream code only filtered out).

satra avatar Oct 18 '20 13:10 satra

That's a fair point. What would be a case in which you'd want to see an error? Trying to think through what the default behaviors should be. I think there should be something between "error" and "no helping", so if we do want to promote these checks to exceptions, then I'm back to wanting separate switches for inferring the intent code and performing validation.

effigies avatar Oct 18 '20 13:10 effigies

this is probably going to take this to a different PR/problem space, so first i will say that true, false, warn/except is a good starting way of handling this.

so the simple use cases of matrix dimension flips and others should be choices for exception (e.g., timeseries by vertices vs vertices by timeseries). although i'm not sure as i write this if cifti actually enforces this or hcp-flavored cifti uses this as convention (i think the latter).

more general approach, let validate take a callable, and one could build up an hcp-flavored validator (this can check on extension to internal mismatches, dimension mismatches, even have heuristics for certain types of information depending on intent codes).

most algorithms assume that the file given to it is the correct file instead of checking to see if it is. for example, i load a dtseries and i assume that the dimensions are vertices/voxels by time. but it may be impossible for nibabel in general to know what they intent of the algorithm/tool designer was. so let the tool designer decide what is valid for their application and nibabel, if it can validate basic things like dimensions, availability of appropriate metadata (e.g., a dtseries should have a samplingstep inside the header that is reasonable).

satra avatar Oct 18 '20 15:10 satra

@MichielCottaar @satra thanks for the reviews 😄

I would suggest to turn around the logic to use the map types as a ground truth of what the user intends rather than the file extension

I like this!

warnings in python are mostly useless, i think, in dataflows and large scale processing, which is where such mismatches are likely to show up

This was my first thought when creating this, but looking back, it may be too strict of an enforcement - I'm fine with reducing the severity of "incorrect" CIFTIs to a warning. I think warning handling is important, but that can (should?) be handled by the specific workflow (i.e. using warnings.simplefilter to treat them as errors).

mgxd avatar Oct 19 '20 17:10 mgxd

@mgxd Are you shooting to get this in for the release?

effigies avatar Oct 19 '20 19:10 effigies

yes, I was hoping to - but this failing test reveals a problem with the approach of looking up based on map configuration.

(CIFTI_INDEX_TYPE_SCALARS, CIFTI_INDEX_TYPE_BRAIN_MODELS) applies to dscalar.nii and dfan.nii

when trying to validate the configuration for a *dscalar.nii file, the information for *dfan.nii is fetched. How can we avoid these collisions?

mgxd avatar Oct 19 '20 20:10 mgxd

We might need to modify recorders to return multiple options if there are multiple matches.

effigies avatar Oct 19 '20 20:10 effigies

I think we'll need to push this off to another release. I've held up 3.2.0 for too long and almost certainly won't have time to review more this week.

effigies avatar Oct 20 '20 12:10 effigies