fmriprep icon indicating copy to clipboard operation
fmriprep copied to clipboard

Transforms-only and apply-transforms modes

Open effigies opened this issue 4 years ago • 4 comments

For very large datasets (10k+ subjects), the cost of storing even a small minimal set of derivatives for each subject can become large. Internally, we delay transforming data in order to do as much as possible in a single shot, reducing interpolations. It should therefore not be very difficult to output all transforms and few if any other derivatives with something like a --transforms-only flag. The user can then construct the needed volumes and time series on the fly, or we could provide an --apply-transforms mode to fully populate a subject directory.

This would be enabled by the X5 transform format, allowing us to store the head-motion-correction transforms for an entire series as a step in a chain from BOLD to template space. I'm not sure if there's an existing format that something like antsApplyTransforms could use; we currently split, apply, and merge.

I list this as medium impact. I think it would be low value for moderately sized datasets, but extremely valuable for very large datasets.

cc @shotgunosine @mih for thoughts.

effigies avatar Jul 06 '20 17:07 effigies

Yeah, I think this would be really helpful for efforts to share fmriprep derivatives in the case that people will be downloading that data and running subsequent processes themselves. If the use case is that subsequent processes will happen in the cloud, the benefit will depend on the processing requirements of the apply transforms operation.

Off the top of my head, the only step where this might not work is slice-time correction. @effigies is that also handled with transformations that end up applied in a single step? Could also be tricky with multi-echo.

Shotgunosine avatar Jul 06 '20 20:07 Shotgunosine

If the use case is that subsequent processes will happen in the cloud, the benefit will depend on the processing requirements of the apply transforms operation.

It seems likely you'll want some level of caching, but

Off the top of my head, the only step where this might not work is slice-time correction. @effigies is that also handled with transformations that end up applied in a single step?

Yeah, STC is done separately, but should be deterministic. I don't think there's any fundamental reason that STC couldn't be included as part of the X5 chain, but transforms with temporal components might (apart from one transform per time point) not be specified yet.

Could also be tricky with multi-echo.

I suspect the combination could be represented as a voxel x echo weight matrix, which would make it not too far from a displacement field. But that's a guess based on a very qualitative understanding of ME.

effigies avatar Jul 06 '20 21:07 effigies

Yeah, STC is done separately, but should be deterministic. I don't think there's any fundamental reason that STC couldn't be included as part of the X5 chain, but transforms with temporal components might (apart from one transform per time point) not be specified yet.

Including STC at once would be really nice - but at this point, I see it very far in the future. To be able to include it directly in the resampling we would need to have a way of interpolating through time too, which is not currently available through scipy (and I honestly don't know the interpolating kernel you should use right this minute).

oesteban avatar Aug 09 '20 07:08 oesteban

Posting to register personal investment in this functionality, and to hopefully draw an update on what would be required to contribute given any changes to transformation handling that have happened since initial posting of the issue.

Lestropie avatar Jun 24 '22 08:06 Lestropie