bids-validator
bids-validator copied to clipboard
participants.tsv participant_id not checked closely enough
ds001971 participants.tsv has two problems:
participant_id participant_id sex age handedness
sub-001 CQ2 F 18-38 R
sub-002 BT4 F 28 R
...
First is that there are two participant_id columns. So it seems like the column IDs should be checked for uniqueness, and an error raised if they are not unique.
Second, when parsed, the subjects show up as CQ2, etc.:
https://openneuro.org/datasets/ds001971/versions/1.1.1/file-display/participants.tsv
Both OpenNeuro and MNE-BIDS parse this file such that the subject names are CQ2, etc., which are non-compliant because names should be "of the form sub-*". This makes me think that there might be a bug where BIDS-Validator does not check for sub-* conformance of the subject names, otherwise it seems like the BIDS-Validator should have emitted an error (or at least a warning!) for this dataset, assuming it also internally saw the participant_ids as CQ2 etc. instead of sub-001 etc. However, the only warning is about not uniform file sets:
But maybe fixing the first part of the bug (duplicate col names), the validator would see the non-compliant names if they were supplied as CQ2 etc., in which the second "bug" should be fixed by fixing the first bug.
Thanks for this report @larsoner! This is certainly a bug ... we should not allow for duplicate column headers. This is also explicitly required in the specification, see: https://bids-specification.readthedocs.io/en/latest/02-common-principles.html#tabular-files
Furthermore, column names MUST NOT be blank (that is, an empty string) and MUST NOT be duplicated within a single TSV file.
Regarding your other point, I think you are correct:
But maybe fixing the first part of the bug (duplicate col names), the validator would see the non-compliant names if they were supplied as CQ2 etc., in which the second "bug" should be fixed by fixing the first bug.
because if I introduce the above bug in a dataset that has only a single participant_id column, I do get a sensible error:
2: [ERR] Participant_id column labels must consist of the pattern "sub-<subject_id>". (code: 212 - PARTICIPANT_ID_PATTERN)
./participants.tsv
@ line: 1
Evidence: Column headers: participant_id, age, sex, handedness
what's weird is that checks for duplicate columns were introduced in:
- https://github.com/bids-standard/bids-validator/pull/1488
and the validator has been released since then, so that feature is available outside of dev versions :thinking:
ohhh ... of course it could be that ds001971 is just not passing bids-validator anymore ... but it's already up on OpenNeuro and the validation was run BEFORE the check for duplicates was released. :thinking:
it'd be good to show a date when datasets on OpenNeuro were validated, together with the bids-validator version that was used for validation. WDYT @effigies @rwblair @nellh ?