tedana icon indicating copy to clipboard operation
tedana copied to clipboard

Add minimally-preprocessed EuskalIBUR dataset to datasets module

Open tsalo opened this issue 3 years ago • 11 comments

Summary

One dataset we may want to include in the new datasets module is EuskalIBUR.

Additional Detail

We'll need to run fMRIPrep on this dataset and then upload it to the OSF (or wherever).

Next Steps

  1. Determine preferred location for data.
  2. Add data to location.
  3. Build manifest (not sure how this is done).
  4. Add new dataset to datasets module.

tsalo avatar Aug 19 '21 16:08 tsalo

@smoia I've seen you making releases of EuskalIBUR_preproc. Is the preprocessed data (preferably echo-wise) going to be uploaded to OpenNeuro as well?

I mainly ask because I think I'd like to make adding a EuskaliBUR fetcher a high priority once #805 is merged, because it seems like the best one to use in the multi-echo-data-analysis book.

tsalo avatar Oct 16 '21 02:10 tsalo

Hey @tsalo! I'm trying to get my scripts less messy and confusing overall, so that's what you saw :sweat_smile: . We intend to release all of the data we collected eventually, following publications we use it in, but we were not thinking about sharing the preprocessed data as well - for many reasons. The main two reasons are that (1) even within our team we have different preprocessing pipelines - so which preprocessed data should we share? All of them? - and that (2) I'm working on my preprocessing to make it as BIDS compatible as possible, but I'm not at the level of being 100% BIDS compliant yet. That'll take time.

However, if you have an idea about this topic, we can discuss it. I'm tagging along @CesarCaballeroGaudes and @eurunuela because of their involvement in the EuskalIBUR project (they also have worked on a different preprocessing pipeline). We can also have a chat at the next dev call (let me know if you want to, I'll partake).

smoia avatar Oct 19 '21 16:10 smoia

We intend to release all of the data we collected eventually

Isn't all of the raw data available on OpenNeuro already? At least the functional data, events, and physio are on there, which is what I think folks using tedana's dataset fetcher will be most interested in.

The main two reasons are that (1) even within our team we have different preprocessing pipelines - so which preprocessed data should we share? All of them?

That's a good point. If the physio processing is fairly standardized, we (the tedana team) could just run fMRIPrep on the functional data and run the physio data through whatever processing you recommend.

That said, if you or @CesarCaballeroGaudes and @eurunuela are comfortable sharing any echo-wise outputs and the necessary transforms to get the denoised data into standard space, that would be great, but no pressure.

tsalo avatar Oct 19 '21 16:10 tsalo

Isn't all of the raw data available on OpenNeuro already? At least the functional data, events, and physio are on there, which is what I think folks using tedana's dataset fetcher will be most interested in.

IIRC only the breath-holding task was shared, but the entire dataset is a collection of a motor, localizer, and another task I can't remember, and 4 resting-state sessions. All that for all 10 sessions of each subject.

That said, if you or @CesarCaballeroGaudes and @eurunuela are comfortable sharing any echo-wise outputs and the necessary transforms to get the denoised data into standard space, that would be great, but no pressure.

@CesarCaballeroGaudes and I were thinking of adding tedana to the preprocessing pipeline we're working on, as we would like to evaluate our algorithms with and without ME-ICA denoising. I guess we could share the inputs and outputs of tedana if @smoia and @CesarCaballeroGaudes agree.

eurunuela avatar Oct 19 '21 17:10 eurunuela

Isn't all of the raw data available on OpenNeuro already? At least the functional data, events, and physio are on there, which is what I think folks using tedana's dataset fetcher will be most interested in.

Ooooh boy no, that's about a tenth of what we have! We also have multiple resting states and tasks - all coming public eventually!

That's a good point. If the physio processing is fairly standardized, we (the tedana team) could just run fMRIPrep on the functional data and run the physio data through whatever processing you recommend.

That's what I was thinking about if I had to share preprocessed data now-ish. fMRIprep shouldn't be unbearable to run even for me. Either that, or the preprocessing we'll adopt for the data paper (which will be more similar to mine then) - but that would come mid-next year at the earnest. The thing is that none of us have BIDS-compliant output. My preprocessing is available (as you noticed) and Eneko's and César would have the afni history in the header, but that's the extent we'd go for...

smoia avatar Oct 19 '21 17:10 smoia

Euskalibur has 10 sessions on 10 subjects with 4 resting states, 4 tasks (functional localizer by Pinel et al; HCP-motor task, Simon task, and breath-hold CVR task), along with continuous recording of respiratory belt, expired CO2 and O2 and cardiac pulse. We will share the raw data and we would not mind sharing the preprocessed data as well. However, as indicated, we are building different preprocessing pipelines; basically @smoia mixed FSL, AFNI and ANTs commands, whereas @eurunuela and I are working on doing as much as possible (ultimately the entire pipeline) in AFNI, including surface-preprocessing, in order to keep the history. As @eurunuela indicates, we plan to add TEDANA in this 'alternative' preprocessing at some point for OC and MEICA (@smoia already has carried out this within his pipeline).

I'm afraid I cannot keep track of all programs: AFNI, fMRIprep, etc... You are much younger and pro coders than me 😃

Physio-recordings will be also made publicly available. We are performing the manual annotation of certain events (e.g. pulse peaks), but we are discussing whether these annotations will be shared as well.

Taking all these into consideration, I don't think the preprocessed volumes can be BIDS compliant in the near future, but this is a task @smoia is pursuing for his scripts, and @eurunuela might consider at some point if he wants.

CesarCaballeroGaudes avatar Oct 19 '21 21:10 CesarCaballeroGaudes

Sorry for the delay in responding. Given that all of the data aren't available yet, I'm leaning toward providing fMRIPrep/Freesurfer derivatives of each version of the dataset as it is updated (or at least at major releases). Running fMRIPrep on my school's HPC shouldn't be a problem for the first release, and we can always revisit our process for future releases.

Would it be possible to upload the preprocessed physio data to the OpenNeuro repo as derivatives? Just for the breath-hold data that have already been shared. If you're willing to share the manual annotations, that would be amazing too, but I think just the filtered and cleaned physio data would be sufficient for the CVR example we want to include in the multi-echo-data-analysis book.

tsalo avatar Nov 01 '21 16:11 tsalo

Hey @tsalo ! I'll try to get this done by Christmas.

smoia avatar Nov 24 '21 10:11 smoia

That would be amazing, thank you!

tsalo avatar Nov 24 '21 16:11 tsalo

@tsalo, would it be ok if instead of bids derivatives I put everything in OSF? I did not prepare them for BIDS, and I don't have the time to do so before loosing access to the resources (I'm transferring).

smoia avatar Dec 10 '21 14:12 smoia

I think OSF would be a great place to put the files. It would be nice to organize/name the files on the OSF according to BIDS, but as long as all of the necessary files and metadata are available, we could always do that at a later date.

tsalo avatar Dec 10 '21 15:12 tsalo