bids-specification
bids-specification copied to clipboard
inheritance: "false positive" conflict among metadata files if some entities are missing
Originally brought up in https://github.com/bids-standard/bids-specification/issues/102#issuecomment-446833775 and discussed briefly in the https://github.com/bids-standard/bids-specification/pull/946/files#r771841807 with @Lestropie.
The Inheritance Principle (IP) has
2. For a given data file, any metadata file is applicable to that data file if:
...
c.The metadata filename does not include any entity absent from the data filename.
...
4. There MUST NOT be multiple metadata files applicable to a data file at one level of the directory hierarchy.
from which for IMHO a legit (didn't check with bids-validator though) case of
├─ sub-01/
│ └─ func/
│ ├─ sub-01_task-rest_acq-tricky_bold.nii.gz
│ ├─ sub-01_task-rest_acq-tricky_bold.json
│ ├─ sub-01_task-rest_bold.nii.gz
│ ├─ sub-01_task-rest_bold.json
we would get a violation of the rule 4 if we follow 2.c
since both sub-01_task-rest_bold.json
and sub-01_task-rest_acq-tricky_bold.json
would be applicable to sub-01_task-rest_acq-tricky_bold.nii.gz
.
I see following possible Fixes:
- F1: do declare such cases/datasets "non-legit" due to inheritance principle (bids-validator should catch that then)
- adjust inheritance principle to make such conflicts avoidable. Possible way I see:
-
F2: Add clarification to 4 (e.g. 4.a) that "- metadata files at the level of the data file and matching data file exactly in all entities and suffix, but only different in extension, are not to be considered as metadata files for other data files" . I.e in the example above
sub-01_task-rest_bold.json
is not considered forsub-01_task-rest_acq-tricky_bold.nii.gz
since there issub-01_task-rest_bold.nii.gz
.- cons: implementation might be tricky since we do not per se have "data suffix" since there could be multitude. So the simplest implementation would be "test if there is ANY other file without such (.json) extension". Then there would be another separate validation check if that other file is part of BIDS according to the schema.
-
F2: Add clarification to 4 (e.g. 4.a) that "- metadata files at the level of the data file and matching data file exactly in all entities and suffix, but only different in extension, are not to be considered as metadata files for other data files" . I.e in the example above
- Contribute more!
With #1003, sub-01_task-rest_acq-tricky_bold.nii.gz
would inherit sub-01_task-rest_bold.json
first, and sub-01_task-rest_acq-tricky_bold.json
second. So the second could override contents of the first and/or add new fields.
Generally in a situation like this it would be recommended that both images should have the acq
entity in order to disambiguate fully. But with #1003 I'm shooting for an objective implementation rather than guidelines. Combined, the solution would look something like:
├─ sub-01/
│ └─ func/
│ ├─ sub-01_task-rest_acq-nontricky_bold.nii.gz
│ ├─ sub-01_task-rest_acq-nontricky_bold.json
│ ├─ sub-01_task-rest_acq-tricky_bold.nii.gz
│ ├─ sub-01_task-rest_acq-tricky_bold.json
│ ├─ sub-01_task-rest_bold.json
So sub-01_task-rest_bold.json
contains information that's common to resting-state acquisitions in this subject, and *_acq-*_bold.json
contain acquisition-specific information. As you say, this is explicitly illegal according to the current IP; #1003 is designed to make it legal.
Thank you @Lestropie , I agree that #1003 would address this issue and make it legit. Thinking further, I am not even sure if validator should even warn about such cases since I might want to use them on purpose in cases where e.g. _acq-tricky
needs to change/add only few metadata fields on top of the .json without it. But we might want to add some generic logic to validation of inheritance principle like "WARNING: later in inheritance principle file overloads with different values significant (>5 && >20%) of metadata fields" to prevent cases where inheritance principle places largely unrelated metadata files into its "sequence".
Personally I certainly wouldn't want to be issuing a warning due to inheritance from multiple files in a single directory. I'm proposing it specifically because I need to use it, so a warning would be counter-productive.
A validator warning about overwrites in the process of inheritance would be reasonable I think. For me, I would be avoiding overwrites, I merely want to be able to store sidecar information at the appropriate point in the hierarchy. Overwriting is ill-advised from my own perspective, but is a possibility and therefore needs to be described robustly.