bids-specification icon indicating copy to clipboard operation
bids-specification copied to clipboard

[ENH] schema - specify what is valid at the root of a dataset

Open rwblair opened this issue 2 years ago • 1 comments

prior art: https://github.com/bids-standard/bids-specification/issues/884 https://github.com/ANCPLabOldenburg/ancp-bids/blob/main/ancpbids/data/bids_graph_schema.yaml

Problems

  • Current schema does not explicitly have a way of what is a directory as opposed to a file.
  • inconsistent ways of describing filenames. top_level_files.yaml and associated_data.yaml use the key of the object to specify the filename (excluding extension) while file names described in datatypes/*.yaml use a combination of entities+suffix+extension to describe a filename

new keys used in rules to help resolve issues:

  • type: directory we currently have no way of saying the thing I'm describing is a directory. If omitted we could assume its a file, or we could make every file use type: file
  • children: this would specify what valid file and directory names inside a given directory are
  • ~~literal:~~ Decided to use pattern to cover this use case. some files in datatypes are simple singleword suffixes with an extension. When interpreting entity-suffix-extension pattern rules we use the underscore as the split character separating entities from each other and entities from the suffix. There are files like 'dataset_description.json' whose preextension string have an underscore, so using suffix-extension with no entity would need to be understood as 'dont split on _' literal gets around that by saying hey just look for this exact thing, forget about entities or suffixes and splitting.
  • pattern: pipelines can have any name. would be interpreted as a regex to match against the directory or filename literal and pattern could be merged, regex matches literals just fine.

A file or directory name could then be described with one of the following patterns:

  • entities + suffix + extensions
  • suffix + extensions
  • ~~literal~~
  • ~~literal + extensions~~
  • pattern
  • pattern + extensions

Another issue maybe, not for this PR, is how implicit references to other parts of the schema are handled, they're context sensitive. If I have an entry under an entities key I know to go look in objects.entities. Here with children I've referenced keys that appear in the same file without a $ref tag.

rwblair avatar May 26 '22 19:05 rwblair

All rules describing valid filenames could also be moved into a single directory to help keep everything together.

rwblair avatar May 26 '22 19:05 rwblair