bids-specification
bids-specification copied to clipboard
schema: Define extensions exclusivity/composition?
looking at https://github.com/bids-standard/bids-specification/pull/1033/files#diff-e1391ae7ff69f13355ee975c7fafc3414020f8ed7f3d26bfa92a4381429f51a0L4 and making a comment https://github.com/bids-standard/bids-specification/pull/1033/files#r838617368 where, as in many other places, we have "alternative" extensions for a file (so only one extension should be used) I also saw
participants:
required: false
extensions:
- .tsv
- .json
where it is allowed to have multiple or even "worse" -- having .json
makes sense only if there is .tsv
(unless for the inheritance we have .json on top level for some .tsv's down the hierarchy), and also looked up curious cases like
src/schema/rules/datatypes/anat.yaml: extensions:
src/schema/rules/datatypes/anat.yaml- - .nii.gz
src/schema/rules/datatypes/anat.yaml- - .nii
src/schema/rules/datatypes/anat.yaml- - .json
where, I think, it is not allowed to have both .nii.gz
and .nii
(while similarly to above ok to accompany with .json
).
I wondered on how we could encode all that in the schema? initial thinking about separating into a dedicated sidecar_extensions
, so would look like
src/schema/rules/datatypes/anat.yaml: extensions:
src/schema/rules/datatypes/anat.yaml- - .nii.gz
src/schema/rules/datatypes/anat.yaml- - .nii
src/schema/rules/datatypes/anat.yaml- sidecar_extensions:
src/schema/rules/datatypes/anat.yaml- - .json
and thus making it implied that for extensions
-- it is "one of" and then sidecar_extensions
is those which could accompany, iff a file with extensions:
is available. But would it then also be "one of" within sidecar_extensions
-- do we have any counter use case which would render above suggestion invalid?
WDYT (attn @bids-standard/schema -- the Team I just initiated, feel welcome to invite/add more people if I forgot anyone)?
I agree that we should figure out how to distinguish extensions that form sets from ones that do not (and are thus mutually exclusive). I think JSON is still a tough case because we have to figure out inheritance in the schema.
Are there any cases where a non-JSON data file can't have a sidecar JSON file? If there aren't, then I think we should probably remove .json
from the list of extensions for non-JSON-based data files, and then just add .json
as a special case in the rendering/validation code.
It could be worth abstracting the idea of sidecar that could apply to any file. BEP-027 is proposing that any file could have a .prov.jsonld sidecar.
It could be worth abstracting the idea of sidecar that could apply to any file. BEP-027 is proposing that any file could have a .prov.jsonld sidecar.
good idea IMHO!
Are there any cases where a non-JSON data file can't have a sidecar JSON file?
we should check programmatically, quickly looking at grep output here are some hits we might want to "fix" indeed by making sidecar generally applicable
src/schema/rules/datatypes/eeg.yaml-- suffixes:
src/schema/rules/datatypes/eeg.yaml- - photo
src/schema/rules/datatypes/eeg.yaml: extensions:
src/schema/rules/datatypes/eeg.yaml- - .jpg
src/schema/rules/datatypes/ieeg.yaml-- suffixes:
src/schema/rules/datatypes/ieeg.yaml- - photo
src/schema/rules/datatypes/ieeg.yaml: extensions:
src/schema/rules/datatypes/ieeg.yaml- - .jpg
src/schema/rules/datatypes/micr.yaml-- suffixes:
src/schema/rules/datatypes/micr.yaml- - photo
src/schema/rules/datatypes/micr.yaml: extensions:
src/schema/rules/datatypes/micr.yaml- - .jpg
src/schema/rules/datatypes/micr.yaml- - .png
src/schema/rules/datatypes/micr.yaml- - .tif
src/schema/rules/datatypes/meg.yaml-- suffixes:
src/schema/rules/datatypes/meg.yaml- - markers
src/schema/rules/datatypes/meg.yaml: extensions:
src/schema/rules/datatypes/meg.yaml- - .sqd
src/schema/rules/datatypes/meg.yaml- - .mrk
src/schema/rules/datatypes/perf.yaml-- suffixes:
src/schema/rules/datatypes/perf.yaml- - asllabeling
src/schema/rules/datatypes/perf.yaml: extensions:
src/schema/rules/datatypes/perf.yaml- - .jpg
.... may be more ...
NB -- note inconsistency for _photo
across datatypes
So a nuance in "abstracting" it, unlike in .prov.jsonld
, without prescribed keys to include in that file, it would be generally "bogus"/useless to have such a sidecar file from a standardization point of view. But since it is allowed in BIDS to have arbitrary keys -- it would not be invalid.
BUT also there are datatypes only with .json
src/schema/rules/datatypes/ieeg.yaml-- suffixes:
src/schema/rules/datatypes/ieeg.yaml- - coordsystem
src/schema/rules/datatypes/ieeg.yaml: extensions:
src/schema/rules/datatypes/ieeg.yaml- - .json
... may be more ...
for https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/04-intracranial-electroencephalography.html#coordinate-system-json-_coordsystemjson -- which is pretty much a data/sidecar hybrid file... I think it is ok even to make the general rule is "any non-.json file can have .json sidecar file".
What about a metafiles.yml
that specifically addresses sidecar files that don't have an independent existence? For universal ones like JSON sidecars, no specific associations need to be defined. For others we can define things like:
- Selectors needed to match
- Entities that cannot be dropped while matching
- Whether this metafile MAY/MUST have a sidecar of its own
sidecar:
extension: .json
provenance:
extension: .prov.jsonld
events:
suffix: events
extension: .tsv
match-entities:
task: REQUIRED # Indicates that this entity can't be dropped
sidecar: OPTIONAL # There MAY be sidecar JSON for these metafiles
associations:
suffix: [bold, eeg, meg, ieeg, beh, pet]
continuous:
suffix: [physio, stim]
extension: .tsv.gz
match-entities:
task: REQUIRED
sidecar: REQUIRED # There MUST be sidecar JSON for these metafiles
associations:
suffix: [bold, eeg, meg, ieeg, beh, pet]
Sorry, I don't think this specifically addresses the above discussion, but I wanted to write it somewhere vaguely relevant while it was in my head.