bids-specification icon indicating copy to clipboard operation
bids-specification copied to clipboard

`rawdata` & root (top-level) BIDS dataset

Open dipterix opened this issue 1 year ago • 2 comments

I feel it a bit confusing reading Source vs. raw vs. derived data section. Currently it provides an alternative way to organize BIDS data in this way.

└─ my_dataset-1/
   ├─ sourcedata 
   ├─ ... 
   ├─ rawdata/
   │  ├─ dataset_description.json 
   │  ├─ participants.tsv 
   │  ├─ sub-01/
   │  ├─ sub-02/
   │  └─ ... 
   └─ derivatives/
      ├─ pipeline_1/
      ├─ pipeline_2/
      └─ ... 

My question is: what is the top-level BIDS directory in this case? Or how do I programmatically and reliably find the subject raw data in general?

Let's say

  1. I set my_dataset-1 as my BIDS folder: it has derivatives, but I can't find dataset_description.json. There is no direct subject data within. The other folders within are not BIDS compliant
  2. If I set my_dataset-1/rawdata as my BIDS folder, then I can't find derivatives as stated in BIDS specs (putting derivatives in rawdata does not make sense at all).

Initially there was dataset_description.json requirement under my_dataset-1, but https://github.com/bids-standard/bids-specification/pull/1741 removes it.

I wonder if it's a good idea to at least put some file in to indicate that this is the root of BIDS directory. This file includes file descriptors to indicate which components in my folders are BIDS-compliant? For example, ask people to provide BIDSDatasetLinks if the folder is structured in such way:

{
	"BIDSDatasetLinks" : {
		"rawdata" : "bids::rawdata",		# default is current directory
		"derivatives" : "bids::derivatives", # this is the default
	},
}

To address https://github.com/bids-standard/bids-specification/pull/1741#issuecomment-2059905338 , people should add .bidsignore.

dipterix avatar Aug 04 '24 02:08 dipterix

what is the top-level BIDS directory in this case?

In your example, my_dataset-1/ is not a BIDS dataset. It is a directory that houses one BIDS dataset (rawdata). In your example, the derivatives directory does not have a dataset_description.json, which means that it is not a BIDS derivatives dataset. If you added a dataset_description.json there, then my_dataset-1 would house two BIDS datasets, but it would still not be a BIDS dataset in itself.

I wonder if it's a good idea to at least put some file in to indicate that this is the root of BIDS directory.

That is what dataset_description.json does

Or how do I programmatically and reliably find the subject raw data in general?

That is being done via DatasetLinks. That is, if in your derivatives you define Sources, the sources will be specified using BIDS URIs and these BIDS URIs will make reference to datasets that are specified in DatasetLinks.

sappelhoff avatar Aug 27 '24 12:08 sappelhoff

@dipterix please check out to-soon-be-released latest version of BIDS specification which has that exampled reworked a little: https://bids-specification.readthedocs.io/en/latest/common-principles.html#source-vs-raw-vs-derived-data

└─ my_project-1/
   ├─ sourcedata/
   │  ├─ dicoms/
   │  ├─ raw/
   │  │  ├─ sub-01/
   │  │  ├─ sub-02/
   │  │  ├─ ... 
   │  │  └─ dataset_description.json 
   │  └─ ... 
   └─ derivatives/
      ├─ pipeline_1/
      ├─ pipeline_2/
      └─ ... 

but to get closer to answering your two posed questions, please have a look at

  • https://github.com/bids-standard/bids-specification/pull/1861

where I argue that the entire project folder can be BIDS dataset, and then by convention sourcedata/raw would be such a "raw BIDS dataset". Note though that in principle there could be multiple raw BIDS datasets used in a project or to create another "derived raw BIDS dataset" (e.g. by combining multiple datasets into one), so such convention alone would might be not sufficient for some cases.

yarikoptic avatar Aug 29 '24 17:08 yarikoptic

I would consider the issue resolved with 1.10.1 release which introduced "study" datasettype.

yarikoptic avatar Oct 20 '25 12:10 yarikoptic