bids-specification
bids-specification copied to clipboard
`rawdata` & root (top-level) BIDS dataset
I feel it a bit confusing reading Source vs. raw vs. derived data section. Currently it provides an alternative way to organize BIDS data in this way.
└─ my_dataset-1/
├─ sourcedata
├─ ...
├─ rawdata/
│ ├─ dataset_description.json
│ ├─ participants.tsv
│ ├─ sub-01/
│ ├─ sub-02/
│ └─ ...
└─ derivatives/
├─ pipeline_1/
├─ pipeline_2/
└─ ...
My question is: what is the top-level BIDS directory in this case? Or how do I programmatically and reliably find the subject raw data in general?
Let's say
- I set
my_dataset-1as my BIDS folder: it hasderivatives, but I can't finddataset_description.json. There is no direct subject data within. The other folders within are not BIDS compliant - If I set
my_dataset-1/rawdataas my BIDS folder, then I can't findderivativesas stated in BIDS specs (puttingderivativesinrawdatadoes not make sense at all).
Initially there was dataset_description.json requirement under my_dataset-1, but https://github.com/bids-standard/bids-specification/pull/1741 removes it.
I wonder if it's a good idea to at least put some file in to indicate that this is the root of BIDS directory. This file includes file descriptors to indicate which components in my folders are BIDS-compliant? For example, ask people to provide BIDSDatasetLinks if the folder is structured in such way:
{
"BIDSDatasetLinks" : {
"rawdata" : "bids::rawdata", # default is current directory
"derivatives" : "bids::derivatives", # this is the default
},
}
To address https://github.com/bids-standard/bids-specification/pull/1741#issuecomment-2059905338 , people should add .bidsignore.
what is the top-level BIDS directory in this case?
In your example, my_dataset-1/ is not a BIDS dataset. It is a directory that houses one BIDS dataset (rawdata). In your example, the derivatives directory does not have a dataset_description.json, which means that it is not a BIDS derivatives dataset. If you added a dataset_description.json there, then my_dataset-1 would house two BIDS datasets, but it would still not be a BIDS dataset in itself.
I wonder if it's a good idea to at least put some file in to indicate that this is the root of BIDS directory.
That is what dataset_description.json does
Or how do I programmatically and reliably find the subject raw data in general?
That is being done via DatasetLinks. That is, if in your derivatives you define Sources, the sources will be specified using BIDS URIs and these BIDS URIs will make reference to datasets that are specified in DatasetLinks.
@dipterix please check out to-soon-be-released latest version of BIDS specification which has that exampled reworked a little: https://bids-specification.readthedocs.io/en/latest/common-principles.html#source-vs-raw-vs-derived-data
└─ my_project-1/
├─ sourcedata/
│ ├─ dicoms/
│ ├─ raw/
│ │ ├─ sub-01/
│ │ ├─ sub-02/
│ │ ├─ ...
│ │ └─ dataset_description.json
│ └─ ...
└─ derivatives/
├─ pipeline_1/
├─ pipeline_2/
└─ ...
but to get closer to answering your two posed questions, please have a look at
- https://github.com/bids-standard/bids-specification/pull/1861
where I argue that the entire project folder can be BIDS dataset, and then by convention sourcedata/raw would be such a "raw BIDS dataset". Note though that in principle there could be multiple raw BIDS datasets used in a project or to create another "derived raw BIDS dataset" (e.g. by combining multiple datasets into one), so such convention alone would might be not sufficient for some cases.
I would consider the issue resolved with 1.10.1 release which introduced "study" datasettype.