bids-specification icon indicating copy to clipboard operation
bids-specification copied to clipboard

Subfolder structure in derivatives

Open dnacombo opened this issue 5 years ago • 4 comments

I suppose this point has been raised before. Trying to revive it here.

No consensus has been reached as to the subfolder structure for derivatives.

Options that are on the table:

  • Consistent (but perhaps too rigid?) subfolder structure following the Raw specifications: sub-01/ses-xxx/xxx/sub-01_ses-xxx_suffix.ext and associated sidecar files. A lot of useless subfolder levels are to be expected for pipelines in which only a subset of the ses-xxx/xxx raw data is used, and/or where only few output per subject is created.
  • Flexible but more prone to errors with anything between: sub-01/sub-01_suffix.ext, and the above.

Opinions wanted

Is there any situation that precludes adopting the first option above?

It seems to me that the complexity of the subfolder structure could be alleviated by the consistency across derivatives/pipelines and raw structures, easy to crawl programmatically. Perhaps inefficient? What are the previous relevant discussion points related to this?

@robertoostenveld @ftadel @agramfort @schoffelen

dnacombo avatar Nov 26 '19 14:11 dnacombo

Subfolders are mostly there for human readability. Programmatic indexing is not a serious constraint here.

Derivatives requires keeping any filename elements that still apply to the derivative file. If we extend this to directories (which it doesn't look like we directly address at this point), then, if functional data are transformed into something non-functional (e.g., a connectivity matrix), it makes sense to drop the func/ and possibly add something like connectivity/. But data cleaning generally doesn't change the intent of the file, so keeping a motion-corrected functional image in func/ makes sense.

What the boundary is between functional derivatives and non-functional derivatives of functional data seems blurry. I would suggest using them if they're useful for your application, and dropping them if they seem like they cause more confusion. (Again, to humans. I don't think machines care.)

effigies avatar Nov 26 '19 14:11 effigies

Ok, so are we leaning towards a flexible everybody-do-it-their-own-way solution?

Although machines are able to find files, human writing scripts picking files in the directory structure need to agree.

A very concrete use-case: we are preparing a course with a MEG BIDS dataset to be processed with 3 different teams and toolboxes. All could use a more or less complete portion of the output of the FreeSurfer recon-all pipeline. Depending on which Freesurfer command was used, or where the data comes from, the freesurfer folder structure is either freesurfer/sub-01/ or freesurfer/sub-01/ses-mri/anat/.

I understand there are no fixed directives and both are valid. Should this change? Are there existing tools allowing to pick files from pipelines without having to manually specify the subfolders?

dnacombo avatar Nov 26 '19 16:11 dnacombo

Although machines are able to find files, human writing scripts picking files in the directory structure need to agree.

Fair point. I was thinking in terms of crawling the directory, not building path templates and assuming you'll find something there. I think if you want to do that, you're just going to need to know things about the outputs of the specific pipelines you want to work with.

That said, if a couple pipelines are producing the same derivatives, intended to be used in the same downstream pipelines, it would be worth forming some consensus and seeing if it can be codified into the spec.

Depending on which Freesurfer command was used, or where the data comes from, the freesurfer folder structure is either freesurfer/sub-01/ or freesurfer/sub-01/ses-mri/anat/.

FreeSurfer isn't BIDS compliant, and it would be a significant effort to make it so. So I really wouldn't expect anything but the subject names to be manageably made congruent with BIDS.

freesurfer/sub-01/ses-mri/anat/ seems like it might make it difficult to work with FreeSurfer tools. Perhaps you could hack it by setting your subject ID to sub-01/ses-mri/anat, but then your FreeSurfer subject ID is quite different from your BIDS subject ID, making script writing a bit more cumbersome.

So I'd say the former makes more sense, though BIDS has very little to say about this, and your constraint is going to be the FreeSurfer suite and any other tools that assume a FreeSurfer structure.

Are there existing tools allowing to pick files from pipelines without having to manually specify the subfolders?

PyBIDS. Perhaps BIDS-MATLAB, though I haven't looked at it at all closely.

Querying FreeSurfer structures is still very likely to be out of scope here. If your tool has translated everything into something BIDSier, then they might work.

effigies avatar Nov 26 '19 16:11 effigies

Could we declare that we should use the FreeSurfer wiki recommendation (https://surfer.nmr.mgh.harvard.edu/fswiki/BIDS) and consider this issue as resolved?

This conversation is linked with two other issues:

  • https://github.com/bids-standard/bids-specification/issues/461
  • https://github.com/bids-standard/bids-specification/issues/1130

ftadel avatar Jun 18 '22 10:06 ftadel

I feel that this discussion has its place in the BEP35 google doc: https://docs.google.com/document/d/1tFRNumQyIgjXBNC3brFDLO9FaikjL84noxK6Om-Ctik/edit#heading=h.gjdgxs

@ftadel @dnacombo make sure to have a look if you have not already

Remi-Gau avatar Sep 26 '23 13:09 Remi-Gau