tedana icon indicating copy to clipboard operation
tedana copied to clipboard

Add minimally-processed Le Petit Prince dataset to datasets module

Open tsalo opened this issue 2 years ago • 9 comments

Summary

A dataset we may want to include in the new datasets module is Le Petit Prince.

Thanks to @emdupre for finding it!

Additional Detail

We will need to process this dataset with fMRIPrep.

Next Steps

  1. Determine preferred location for data.
  2. Add data to location.
  3. Build manifest (not sure how this is done).
  4. Add new dataset to datasets module.

tsalo avatar Aug 20 '21 19:08 tsalo

I have finally finished preprocessing Le Petit Prince. I think everything's good, though I haven't had a chance to do QC.

@jsheunis do you want me to put it anywhere in particular for multi-echo-super?

EDIT: The fMRIPrep derivatives are 423G, by the way.

tsalo avatar Mar 30 '22 16:03 tsalo

@tsalo wherever you prefer is good. To build it into a datalad dataset, all we need would be a list of file paths and their access URLs (e.g. the content of this csv file), assuming that the location of these files will remain constant (at least for a while). For the cambridge derivatives on OSF, I got that info from the manifest.json file that was included in the OSF project.

jsheunis avatar Mar 30 '22 19:03 jsheunis

The OSF is limited to 50 GB per public projects, so I would have to split the derivatives across 10 projects. Does anyone know of a good alternative? @emdupre @handwerkerd @jbteves ?

tsalo avatar Mar 31 '22 17:03 tsalo

What about GIN?

jsheunis avatar Mar 31 '22 18:03 jsheunis

I can't think if any. I don't think any fully open system is set up to day all outputs & intermediate files from a processing pipeline right now. If this is for eventual output testing, we can save everything for one or two subjects and then save a more minimal set for everything else. For example, the original data is already shared. If we share the mixing matrices, component table, you could potentially just share a script to generate all output volumes from those files. It's not ideal, but it might be easier than other options, even this storage limit.

handwerkerd avatar Mar 31 '22 20:03 handwerkerd

What about GIN?

I'll give GIN a shot. I recall that we looked at it at one point, but I had trouble using it. Is there a Singularity image that would make it easy to upload from an HPC with minimal knowledge?

If this is for eventual output testing, we can save everything for one or two subjects and then save a more minimal set for everything else.

That's probably the primary use case, but it would also be great to have these data for examples, tutorials (like the Jupyter book), and methods development. I guess uploading only the ICA derivatives can be the option of last resort.

tsalo avatar Apr 03 '22 15:04 tsalo

I'll give GIN a shot. I recall that we looked at it at one point, but I had trouble using it. Is there a Singularity image that would make it easy to upload from an HPC with minimal knowledge?

Any update on this @tsalo? I may be interested in using the dataset to validate a method I'm working on.

eurunuela avatar Sep 08 '22 08:09 eurunuela

Sorry, I guess I only followed up in the multi-echo-super issue. I believe I successfully pushed the derivatives to https://gin.g-node.org/ME-ICA/ds003643-fmriprep-derivatives.

tsalo avatar Sep 08 '22 13:09 tsalo

Sorry, I guess I only followed up in the multi-echo-super issue. I believe I successfully pushed the derivatives to https://gin.g-node.org/ME-ICA/ds003643-fmriprep-derivatives.

This is great. Thank you!

eurunuela avatar Sep 09 '22 10:09 eurunuela