bids-specification [ENH] Add reference volumes to common imaging derivatives

It was discovered that fmriprep, aslprep and qsiprep all generate <suffix>ref volumes for their respective input modalities. Here is an initial proposal. I do not think we need any additional metadata over the universally RECOMMENDED Description and OPTIONAL Sources.

I kept space-<label> and res-<label> but not den-<label>, since these do make sense to resample into target spaces and possibly with resolutions differing from the input images, but there doesn't seem a clear analog for surface meshes. I suppose you could theoretically sample the boldref to a surface as a diagnostic, but I figure we should let the use case come up before specifying it.

Contradicting that, I threw cbvref in there, since it's functional data like bold. I have no clue how that's processed, but it seems reasonable to say "if you want a reference file, call it cbvref". In any case, there's nothing stopping modalities from adding another suffix that in practice they use as a reference volume. It is just useful to have a name for the thing that's not an actual average image but is some attempt to make a good registration target from what we are given.

Closes #1532.

Jun 28 '23 20:06 effigies

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (6d7eb0f) 87.83% compared to head (ae392b9) 87.83%. Report is 134 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1533   +/-   ##
=======================================
  Coverage   87.83%   87.83%           
=======================================
  Files          16       16           
  Lines        1356     1356           
=======================================
  Hits         1191     1191           
  Misses        165      165

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Jun 28 '23 20:06 codecov[bot]

In the current proposal, the filename suffix is the concatenation of two items: some other pre-existing suffix indicating the contrast present in some other image, and "ref". I'm going to throw a spanner in the works and question whether this should be the case, despite some limited precedent.

Not all cases of "3D reference images generated from a > 3D dataset for the purpose of registration" are faithful to the suffix from which they may have originated. For instance:
1. A mean b=0 from DWI explicitly has no diffusion-weighting; it's just a T2-weighted image.
2. The reference volume used for registration of DWI data could look nothing like raw DWI data. Could be an FA image (or anything else that could have its own more appropriate filename), could be a pseudo-T1w image like here or here, could be anything else hypothetically.
This is mixing up image content with processing. Pre-BIDS, if I were to point within a dataset at a particular file, where that file were generated by taking the mean of an fMRI time series, and I were to describe those data to a third party, I would not say "that's the fMRI reference volume used for registration".
1. If describing the data themselves, which is IMO the purview of BIDS, I'd say that the best descriptor of that image is the mean statistic taken along the 4th axis of the BOLD timeseries. See eg. https://github.com/bids-standard/bids-bep016/issues/61 (even though I've not convinced myself that that's the right way to go).
2. If describing how the data were used, ie. specifically as the reference volume for registration in a previously completed processing pipeline, that is IMO provenance.
3. If describing what the data are intended to be used for, ie. here's an image that this pipeline thinks would serve well as a reference volume for registration in some subsequent pipeline that uses this derivative dataset as an input, then:
  1. It should be up to that pipeline to decide what of the derivatives available it wants to use for that purpose;
  2. That falls in the domain of intended utilisation in processing. This I spoke about in https://github.com/bids-standard/bids-2-devel/issues/53 as I think it should be separated from the data as much as possible.
In the worst case scenario, this could lead to a doubling of the number of suffices, despite a desire to minimize such.

Jun 29 '23 00:06 Lestropie

The purpose of these volumes is for performing, applying and evaluating registrations. They do not map any particular physical quantity, they cannot necessarily be described as a voxel-wise summary statistic of the original series, and two tools generating references do not need to generate the same reference. They are useful for provenance and post-pipeline reuse. For example, when attempting to find a minimal set of derivatives needed to regenerate the rest, this is a critical one.

I'm not sure if your position is that tools should not create these reference volumes, should have heterogeneous names that more closely reflect the contents, or should shove these files in .bidsignore. I think you may be trying to achieve a purity in BIDS that is not really possible.

Not all cases of "3D reference images generated from a > 3D dataset for the purpose of registration" are faithful to the suffix from which they may have originated.

I wrote "A reference volume is a 3D image that is used to represent a 4D series," which is specifically not intended to indicate that the file was generated from the original 4D dataset or that the resulting contrast matches that suffix.

This is mixing up image content with processing.

I think this derivative is both valuable and inextricable from processing.

I'd say that the best descriptor of that image is the mean statistic taken along the 4th axis of the BOLD timeseries

That's not what boldref is. I don't know if it is what aslref or dwiref is. I would say you are welcome to use stat-mean or images where that applies.

In the worst case scenario, this could lead to a doubling of the number of suffices, despite a desire to minimize such.

This is hyperbolic. There are currently 5 explicitly 4D suffixes (asl, bold, cbv, dwi, pet) where this would clearly apply, and three of them have tools that already do this. For sensor-based modalities, I don't know if there's an equivalent. If so, I would honestly be willing to move this to common derivatives as a principle, because it is so useful.

I also don't see minimizing suffixes as an explicit goal so much as a heuristic for finding cases where a few suffixes and one or two entities might replace many suffixes. I have very little concern about boldref taking a suffix that might be used for something else in another context, while something like mean could apply in many cases and it would be good not to claim it for a very narrow case.

Jun 29 '23 11:06 effigies

Not entirely sure what the most precise terms are in all this.

I agree with what Robert says about the "reference" for DWI registration potentially being a non-diffusion volume. So naming a e.g. population T1w average (a structural derivative) a DWI reference ("dwiref") seems misleading to me.
However, I grant that the description in https://github.com/bids-standard/bids-specification/pull/1533/files#diff-83476cde0b0492fc6c54a092949c7f9bf89f2cf71343ee5f400322aaeab3254cR156 does not imply that the dwiref is necessarily used ("(...) often used (...)") for registration of DWI volumes.
To me, the precise terms for registration purposes would be "static" and "moving" images, regardless of the modality (in the "source" and "target" terms, the first one maybe misleading within BIDS (?), or less clear (?)). Defining what the "static" image should be/how it should be computed maybe belongs to "what the data are intended to be used for", following the terms Robert used.
Strictly speaking, I'd say that the reference in DWI would be the $S_{0}$ volume (i.e. "b0"/"b=0" -not sure about Matt's comment in https://github.com/bids-standard/bids-specification/issues/1532#issuecomment-1611919421 about these being different). So when using the term "reference" to name a e.g. T1w for a DWI volume, maybe we are being too "generous" with the terms.

Sorry for chiming in/if the above comment does not help in reaching an agreement.

Jul 06 '23 13:07 jhlegarreta

Should Sources be added as a RECOMMENDED metadata field? I know it's OPTIONAL for derivatives, but, as with masks, it seems particularly relevant for reference volumes.

I would be fine with promoting it to RECOMMENDED.

the precise terms for registration purposes would be "static" and "moving" images, regardless of the modality

Not sure that that's very useful here, as one process's static image is another process's moving. In the case of a boldref, it is static for motion correction and moving for coregistration. I really think "reference" is a useful term, as it is an image with a definite affine and grid that stands in for any other images that are aligned with it. Its contents need have no meaning apart from their usefulness in registration. If there's a more precise term for this type of image, I'm happy to use it, but it should not be dependent on the direction of registration.

For what it's worth, the original inspiration for boldref was the single-band reference (SBRef) from the Human Connectome Project, which has the property of being a useful stand-in for the BOLD series while not being derived from the series.

Jul 06 '23 16:07 effigies

I think you may be trying to achieve a purity in BIDS that is not really possible.

I do that. You may have noticed more generally. :-P I've had plenty of experience of blurring of logical concepts leading to problems in software design / communication, so will advocate for the cleanest separation even if the consequences of failing to do so aren't clear and won't manifest for years. But I have no expectation of always getting my way. Just trying to provide insights based on that experience and seeing which of them those of authority agree with.

I wrote "A reference volume is a 3D image that is used to represent a 4D series," which is specifically not intended to indicate that the file was generated from the original 4D dataset or that the resulting contrast matches that suffix.

I think refining this might provide some guidance / consensus. "Used to represent a series" is very vague.

Which of the following is the case?
1. "The reference image" is "the image that is used for registration".
2. "The reference image" is something like "a dataset possessing high contrast, potentially of reduced dimensionality with respect to the original dataset, that best localises the spatial position and internal structure of the data content", for which potential applications include registration but also things like visualisation.
This might influence choice of suffix / description.
With respect to the choice of suffices, I had mentioned the (unofficial?) policy of minimising new suffices. The current proposal I suppose I would place in the intermediate range in this respect: it's greater than one, and less than the total possible number of different image contrasts. But the generation of those new proposed suffices introduces two potential problems:
1. The image content may look nothing like what the suffix describes (eg. dwiref; as mentioned earlier above)
2. In most of these cases, the imaging modality (eg. "dwi") will already be indicated by the directory in which the file resides, and so being a part of the suffix also will be redundant.
Reason I re-raise i. and add ii. is that there's an alternative option: introduce only one new suffix. The modality of such would be inferred from its directory location. Looks like @tsalo mentioned and discarded it, but I think it's worth considering. The data from which it was generated would ideally be encoded via provenance, but it might be appropriate to also define a metadata field that is mandatory for data files with this suffix that lists those data files for which it provides such a spatial reference. The stumbling block I see for this approach is where there are two such references generated for a single imaging modality. On first consideration, these could be disambiguated at the file level using _desc-<suffix> and more precisely from the metadata field above. But this is beyond my personal experience so curious to know if someone thinks this would be more fundamentally broken.

And of course it would not be compatible with some BIDS Apps that have already made their own decisions on how to export such images; but as you say, I'm a purist, so I'd prefer to contemplate the decision rather than relying on precedent that may itself have come from just satisfying an immediate requirement. The relationship to SBRef makes sense, and I'd kind of guessed as such already. There "single-band" conveys a lot more about the expected image content, relating to 2.i. above.

the precise terms for registration purposes would be "static" and "moving" images, regardless of the modality

"Static" and "moving" are best avoided here, since they imply an asymmetric registration, which is not always the case. That's also tying even more strongly to actual processing, as opposed to data content or even intent of processing, which I've expressed my objection to above.

Aug 30 '23 07:08 Lestropie

bids-specification bids-specification copied to clipboard

[ENH] Add reference volumes to common imaging derivatives

Codecov Report

bids-specification
bids-specification copied to clipboard