tedana
tedana copied to clipboard
Add minimally-processed Le Petit Prince dataset to datasets module
Summary
A dataset we may want to include in the new datasets
module is Le Petit Prince.
Thanks to @emdupre for finding it!
Additional Detail
We will need to process this dataset with fMRIPrep.
Next Steps
- Determine preferred location for data.
- Add data to location.
- Build manifest (not sure how this is done).
- Add new dataset to
datasets
module.
I have finally finished preprocessing Le Petit Prince. I think everything's good, though I haven't had a chance to do QC.
@jsheunis do you want me to put it anywhere in particular for multi-echo-super?
EDIT: The fMRIPrep derivatives are 423G, by the way.
@tsalo wherever you prefer is good. To build it into a datalad dataset, all we need would be a list of file paths and their access URLs (e.g. the content of this csv file), assuming that the location of these files will remain constant (at least for a while). For the cambridge derivatives on OSF, I got that info from the manifest.json
file that was included in the OSF project.
The OSF is limited to 50 GB per public projects, so I would have to split the derivatives across 10 projects. Does anyone know of a good alternative? @emdupre @handwerkerd @jbteves ?
What about GIN?
I can't think if any. I don't think any fully open system is set up to day all outputs & intermediate files from a processing pipeline right now. If this is for eventual output testing, we can save everything for one or two subjects and then save a more minimal set for everything else. For example, the original data is already shared. If we share the mixing matrices, component table, you could potentially just share a script to generate all output volumes from those files. It's not ideal, but it might be easier than other options, even this storage limit.
What about GIN?
I'll give GIN a shot. I recall that we looked at it at one point, but I had trouble using it. Is there a Singularity image that would make it easy to upload from an HPC with minimal knowledge?
If this is for eventual output testing, we can save everything for one or two subjects and then save a more minimal set for everything else.
That's probably the primary use case, but it would also be great to have these data for examples, tutorials (like the Jupyter book), and methods development. I guess uploading only the ICA derivatives can be the option of last resort.
I'll give GIN a shot. I recall that we looked at it at one point, but I had trouble using it. Is there a Singularity image that would make it easy to upload from an HPC with minimal knowledge?
Any update on this @tsalo? I may be interested in using the dataset to validate a method I'm working on.
Sorry, I guess I only followed up in the multi-echo-super
issue. I believe I successfully pushed the derivatives to https://gin.g-node.org/ME-ICA/ds003643-fmriprep-derivatives.
Sorry, I guess I only followed up in the
multi-echo-super
issue. I believe I successfully pushed the derivatives to https://gin.g-node.org/ME-ICA/ds003643-fmriprep-derivatives.
This is great. Thank you!