bids-specification
bids-specification copied to clipboard
Clarify relation of task entity to TaskName metadata using RFC 2119 MUST/SHOULD/MAY language
e.g., This file: https://github.com/bids-standard/bids-examples/blob/7dc73456fe41bf204c1c71f6512293d2f33c6686/eeg_matchingpennies/task-matchingpennies_eeg.json#L2
has as task name "Matching Pennies" although the label in the files is task-matchingpennies
I think the validator should ensure that these two "fields" match
From the spec: "The task label included in the file name is derived from this TaskName field by removing all non-alphanumeric ([a-zA-Z0-9]) characters. For example TaskName faces n-back will correspond to task label facesnback"
From that I take TaskName as being the generally accepted, more human readable, name of the task with the entity for the task name being properly defined in the filename itself.
The appropriate check might be to take TaskName, strip non-alphanum (convert to lowercase?) and compare that with the found task entity.
The appropriate check might be to take TaskName, strip non-alphanum (convert to lowercase?) and compare that with the found task entity.
agreed! except for the "convert to lowercase", because I think casing is not touched by the spec.
Not reading this as REQUIRED language, so if we do this, I think it should be a warning. It doesn't feel extremely helpful, though.
@effigies There doesn't appear to be any validation check that the file's task entity label is derived from TaskName.
Example: "TaskName": "Any Task Name" will not trigger the validator for entity _task-resting_
From the spec: "The task label included in the file name is derived from this TaskName field by removing all non-alphanumeric ([a-zA-Z0-9]) characters. For example TaskName faces n-back will correspond to task label facesnback"
the citation: https://bids-specification.readthedocs.io/en/stable/modality-specific-files/magnetic-resonance-imaging-data.html#task-metadata-for-anatomical-scans
Maintainer notes, March 6: The json field is recommended not required, but if it's there and the naming derivation isn't respected, it's challenging for platforms/tools downstream of the BIDS validator.
Implementing this check (error or warning tbc) for Validator means adding string checking. Could be raised for community discussion with an eye to backwards compatibility - what will implementing this break? @effigies to do a quick grep check to see how common this issue will be in e.g. openNeuro
Of the datasets on OpenNeuro, 1119 have files with both a task entity and TaskName in some JSON sidecar. Of those, 321 do not satisfy the criterion that entity['task'].lower() == re.sub(r'[^a-z0-9]', '', metadata['TaskName'].lower()).
For the interested, this is one mismatch from each dataset that was found to have mismatches: mismatches.csv
Note that for mri the TaskName field is 'RECOMMENDED' (link above)
But for EEG TaskName is 'REQUIRED' So this check would be ideally enforced in the Validator https://bids-specification.readthedocs.io/en/stable/modality-specific-files/electroencephalography.html#task-information
The presence of the field is enforced by the validator, if it's REQUIRED. If this convention is to become a validatable rule, then it needs a schema rule, e.g.:
TaskNameConsistency:
selectors:
- "'TaskName' in sidecar"
- entities.task
checks:
- lower(entities.task) == lower(strip(sidecar.TaskName))
(This assumes that lower() and strip() become functions and have meanings that are appropriate for this check. We can bikeshed this separately.)
More importantly, if this is to become a validated check, then I also think the spec should clearly state this as a requirement using RFC 2119 language (change bolded):
The task label included in the file name MUST be derived from this TaskName field by removing all non-alphanumeric ([a-zA-Z0-9]) characters.
Given the broad (>25% of relevant OpenNeuro datasets) interpretation of this as a convention and not a requirement, I would consider this a backwards incompatible change. This is not a decision for maintainers to make without community consensus and I think the steering group should make the final decision.
I would be curious to see a link to code that actually depends on this correspondence, by the way. I understand that there is some EEG code that does depend on it, but it's not obvious what someone would use this for.