bids-specification
bids-specification copied to clipboard
[ENH] schema - specify what is valid at the root of a dataset
prior art: https://github.com/bids-standard/bids-specification/issues/884 https://github.com/ANCPLabOldenburg/ancp-bids/blob/main/ancpbids/data/bids_graph_schema.yaml
Problems
- Current schema does not explicitly have a way of what is a directory as opposed to a file.
- inconsistent ways of describing filenames. top_level_files.yaml and associated_data.yaml use the key of the object to specify the filename (excluding extension) while file names described in datatypes/*.yaml use a combination of entities+suffix+extension to describe a filename
new keys used in rules to help resolve issues:
-
type: directory
we currently have no way of saying the thing I'm describing is a directory. If omitted we could assume its a file, or we could make every file usetype: file
-
children:
this would specify what valid file and directory names inside a given directory are - ~~
literal:
~~ Decided to use pattern to cover this use case. some files in datatypes are simple singleword suffixes with an extension. When interpreting entity-suffix-extension pattern rules we use the underscore as the split character separating entities from each other and entities from the suffix. There are files like 'dataset_description.json' whose preextension string have an underscore, so using suffix-extension with no entity would need to be understood as 'dont split on _' literal gets around that by saying hey just look for this exact thing, forget about entities or suffixes and splitting. -
pattern:
pipelines can have any name. would be interpreted as a regex to match against the directory or filename literal and pattern could be merged, regex matches literals just fine.
A file or directory name could then be described with one of the following patterns:
- entities + suffix + extensions
- suffix + extensions
- ~~literal~~
- ~~literal + extensions~~
- pattern
- pattern + extensions
Another issue maybe, not for this PR, is how implicit references to other parts of the schema are handled, they're context sensitive. If I have an entry under an entities key I know to go look in objects.entities. Here with children I've referenced keys that appear in the same file without a $ref tag.
All rules describing valid filenames could also be moved into a single directory to help keep everything together.