Validation of ds005256 never completes
https://openneuro.org/datasets/ds005256
I git pushed yesterday, but it is still
as of today.
How long does the validator take to run locally?
We've clocked it, and it takes 15 minutes on our end. My best guess is that the HED tags are causing a slowdown.
details -- about 5 minutes
(deno) yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ duct deno run --allow-read --allow-env --reload -A https://github.com/bids-standard/bids-validator/raw/deno-build/bids-validator.js --outfile logs/deno-bids-validator.log $PWD
2025-01-28T16:41:25-0500 [INFO ] con-duct: duct is executing 'deno run --allow-read --allow-env --reload -A https://github.com/bids-standard/bids-validator/raw/deno-build/bids-validator.js --outfile logs/deno-bids-validator.log /mnt/DATA/data/yoh/1076_spacetop'...
2025-01-28T16:41:25-0500 [INFO ] con-duct: Log files will be written to .duct/logs/2025.01.28T16.41.25-3688580_
Download https://github.com/bids-standard/bids-validator/raw/deno-build/bids-validator.js
Download https://raw.githubusercontent.com/bids-standard/bids-validator/deno-build/bids-validator.js
2025-01-28T16:46:17-0500 [INFO ] con-duct: Summary:
Exit Code: 1
Command: deno run --allow-read --allow-env --reload -A https://github.com/bids-standard/bids-validator/raw/deno-build/bids-validator.js --outfile logs/deno-bids-validator.log /mnt/DATA/data/yoh/1076_spacetop
Log files location: .duct/logs/2025.01.28T16.41.25-3688580_
Wall Clock Time: 291.983 sec
Memory Peak Usage (RSS): 505.5 MB
Memory Average Usage (RSS): 438.6 MB
Virtual Memory Peak Usage (VSZ): 21.1 GB
Virtual Memory Average Usage (VSZ): 21.0 GB
Memory Peak Percentage: 0.0%
Memory Average Percentage: 0.0%
CPU Peak Usage: 164.0%
Average CPU Usage: 160.09090909090904%
(deno) yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ grep ERROR logs/deno-bids-validator.log
[ERROR] PARTICIPANT_ID_MISMATCH Participant labels found in this dataset did not match the values in participant_id column
[ERROR] HED_ERROR The validation on this HED string returned an error.
/task-social_events.json - ERROR: [TAG_EXTENSION_INVALID] "Categorical-value" appears as "Property/Data-property/Data-value/Categorical-value" and cannot be used as an extension. Indices ([object Object], ). (For more information on this HED error, see https://hed-specification.readthedocs.io/en/latest/Appendix_B.html#tag-extension-invalid.)
/task-social_events.json - ERROR: [TAG_EXTENSION_INVALID] "Categorical-value" appears as "Property/Data-property/Data-value/Categorical-value" and cannot be used as an extension. Indices ([object Object], ). (For more information on this HED error, see https://hed-specification.readthedocs.io/en/latest/Appendix_B.html#tag-extension-invalid.)
Please visit https://neurostars.org/search?q=HED_ERROR for existing conversations about this issue.
[ERROR] HED_INTERNAL_ERROR An internal error occurred during HED validation.
GENERIC_ERROR
/sub-0001/ses-02/func/sub-0001_ses-02_task-faces_acq-mb8_run-01_events.tsv - ERROR: [GENERIC_ERROR] Internal error - message: "Attempting to access the onset of a TSV row without one.". (For more information on this HED error, see https://hed-specification.readthedocs.io/en/latest/Appendix_B.html#generic-error.)
/sub-0001/ses-02/func/sub-0001_ses-02_task-faces_acq-mb8_run-02_events.tsv - ERROR: [GENERIC_ERROR] Internal error - message: "Attempting to access the onset of a TSV row without one.". (For more information on this HED error, see https://hed-specification.readthedocs.io/en/latest/Appendix_B.html#generic-error.)
Please visit https://neurostars.org/search?q=HED_INTERNAL_ERROR for existing conversations about this issue.
[ERROR] STIMULUS_FILE_MISSING A stimulus file was declared but not found in the dataset.
(deno) yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ tail logs/deno-bids-validator.log
Summary: Available Tasks: Available Modalities:
26373 Files, 2.23 TB alignvideo MRI
118 - Subjects 4 - Sessions faces
fractional
narratives
shortvideo
social
If you have any questions, please post on https://neurostars.org/tags/bids.
so took about 5 minutes... initial run which failed to save log due to missing lead dir took the same-ish.
so is the problem that job times out and never reported completed?
ping on this -- any way to get over the validation hump?
Hopefully https://github.com/bids-standard/bids-validator/pull/156 produced some speedup. Running it now.
Definitely shorter at 3m50s:
[ERROR] HED_ERROR The validation on this HED string returned an error.
None
/task-fractional_events.json - TypeError: Cannot convert undefined or null to object
/sub-0038/ses-04/func/sub-0038_ses-04_task-fractional_acq-mb8_run-01_events.tsv - TypeError: Cannot convert undefined or null to object
203 more files with the same issue
TAG_EXTENSION_INVALID
/task-social_events.json - ERROR: [TAG_EXTENSION_INVALID] "Categorical-value" appears as "Property/Data-property/Data-value/Categorical-value" and cannot be used as an extension. Indices ([object Object], ). (For more information on this HED error, see https://hed-specification.readthedocs.io/en/latest/Appendix_B.html#tag-extension-invalid.)
/task-social_events.json - ERROR: [TAG_EXTENSION_INVALID] "Categorical-value" appears as "Property/Data-property/Data-value/Categorical-value" and cannot be used as an extension. Indices ([object Object], ). (For more information on this HED error, see https://hed-specification.readthedocs.io/en/latest/Appendix_B.html#tag-extension-invalid.)
9068 more files with the same issue
Please visit https://neurostars.org/search?q=HED_ERROR for existing conversations about this issue.
[ERROR] PARTICIPANT_ID_MISMATCH Participant labels found in this dataset did not match the values in participant_id column
found in the participants.tsv file.
/participants.tsv
Please visit https://neurostars.org/search?q=PARTICIPANT_ID_MISMATCH for existing conversations about this issue.
[ERROR] STIMULUS_FILE_MISSING A stimulus file was declared but not found in the dataset.
/sub-0038/ses-02/func/sub-0038_ses-02_task-narratives_acq-mb8_run-03_events.tsv
/sub-0084/ses-03/func/sub-0084_ses-03_task-alignvideo_acq-mb8_run-01_events.tsv
10 more files with the same issue
Please visit https://neurostars.org/search?q=STIMULUS_FILE_MISSING for existing conversations about this issue.
[ERROR] HED_INTERNAL_ERROR An internal error occurred during HED validation.
GENERIC_ERROR
/sub-0038/ses-02/func/sub-0038_ses-02_task-narratives_acq-mb8_run-03_events.tsv - ERROR: [GENERIC_ERROR] Internal error - message: "Attempting to access the onset of a TSV row without one.". (For more information on this HED error, see https://hed-specification.readthedocs.io/en/latest/Appendix_B.html#generic-error.)
/sub-0084/ses-02/func/sub-0084_ses-02_task-faces_acq-mb8_run-01_events.tsv - ERROR: [GENERIC_ERROR] Internal error - message: "Attempting to access the onset of a TSV row without one.". (For more information on this HED error, see https://hed-specification.readthedocs.io/en/latest/Appendix_B.html#generic-error.)
325 more files with the same issue
Please visit https://neurostars.org/search?q=HED_INTERNAL_ERROR for existing conversations about this issue.
That said, it's still not showing as completed on the draft page after I retriggered validation. I don't know why.
@nellh any ideas on this issue of non-completion validation? FWIW, I had pushed a few more times , and it never finished spinning. May be it relates to the fact that there is LOTS of files/warnings for them. E.g. here is the annexed json output https://github.com/spatialtopology/ds005256/blob/master/derivatives/bids-validator/deno-bids-validator.json (let me know if I should share it somewhere) and annex key says it is 100MB in size!
@nellh any ideas on this issue of non-completion validation? FWIW, I had pushed a few more times , and it never finished spinning. May be it relates to the fact that there is LOTS of files/warnings for them. E.g. here is the annexed json output https://github.com/spatialtopology/ds005256/blob/master/derivatives/bids-validator/deno-bids-validator.json (let me know if I should share it somewhere) and annex key says it is 100MB in size!
The next release of OpenNeuro (planned for this week) will include @bids/validator 2.0.3 with several fixes here. We do limit warnings to 50k on OpenNeuro now to avoid the issues previously with very verbose warning datasets running into size limits for the output.
- limits are good, but still there should be some feedback then to the user, like "too many warnings/errors detected, run validator locally"
- @effigies also mentioned that this dataset succumbed to some "poisoned" json output issue which was addressed too.
I see website says it is 4.32.0 which was released yesterday, so I guess should have addressed above... I pushed a new commit (thank you codespell for always finding something to fix ;) ) -- and "Validation Pending" seems to persist.
ok, validator is ERROR is free now! There is still 100s of thousands of warnings since it is a very good rich dataset with lots of good files
here is a summary over those 50ish types of warnings with counts
(deno) yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ jq -r '.issues.issues[] | "\(.code) \(.subCode) \(.severity)"' derivatives/bids-validator/deno-bids-validator.json | sort | uniq -c | sort -n | nl
1 1 JSON_KEY_RECOMMENDED SourceDatasets warning
2 5 SIDECAR_KEY_RECOMMENDED SliceTiming warning
3 5 SIDECAR_KEY_RECOMMENDED TaskName warning
4 9 SUSPICIOUSLY_LONG_EVENT_DESIGN null warning
5 17 SUSPICIOUSLY_SHORT_EVENT_DESIGN null warning
6 48 TSV_ADDITIONAL_COLUMNS_UNDEFINED question warning
7 122 SIDECAR_KEY_RECOMMENDED DeviceSerialNumber warning
8 122 SIDECAR_KEY_RECOMMENDED DwellTime warning
9 122 SIDECAR_KEY_RECOMMENDED EchoTime warning
10 122 SIDECAR_KEY_RECOMMENDED FlipAngle warning
11 122 SIDECAR_KEY_RECOMMENDED InstitutionAddress warning
12 122 SIDECAR_KEY_RECOMMENDED InstitutionalDepartmentName warning
13 122 SIDECAR_KEY_RECOMMENDED MagneticFieldStrength warning
14 122 SIDECAR_KEY_RECOMMENDED ManufacturersModelName warning
15 122 SIDECAR_KEY_RECOMMENDED Manufacturer warning
16 122 SIDECAR_KEY_RECOMMENDED PartialFourier warning
17 122 SIDECAR_KEY_RECOMMENDED PulseSequenceDetails warning
18 122 SIDECAR_KEY_RECOMMENDED ReceiveCoilActiveElements warning
19 122 SIDECAR_KEY_RECOMMENDED ReceiveCoilName warning
20 122 SIDECAR_KEY_RECOMMENDED ScanningSequence warning
21 122 SIDECAR_KEY_RECOMMENDED ScanOptions warning
22 122 SIDECAR_KEY_RECOMMENDED SequenceName warning
23 122 SIDECAR_KEY_RECOMMENDED SequenceVariant warning
24 122 SIDECAR_KEY_RECOMMENDED SoftwareVersions warning
25 122 SIDECAR_KEY_RECOMMENDED StationName warning
26 127 SIDECAR_KEY_RECOMMENDED InstitutionName warning
27 239 SIDECAR_KEY_RECOMMENDED EffectiveEchoSpacing warning
28 239 SIDECAR_KEY_RECOMMENDED PhaseEncodingDirection warning
29 239 SIDECAR_KEY_RECOMMENDED TotalReadoutTime warning
30 1813 TSV_COLUMN_TYPE_REDEFINED duration warning
31 1969 SIDECAR_KEY_RECOMMENDED B0FieldIdentifier warning
32 4178 SIDECAR_KEY_RECOMMENDED AcquisitionDuration warning
33 4178 SIDECAR_KEY_RECOMMENDED CogPOID warning
34 4178 SIDECAR_KEY_RECOMMENDED DelayAfterTrigger warning
35 4178 SIDECAR_KEY_RECOMMENDED DelayTime warning
36 4178 SIDECAR_KEY_RECOMMENDED Instructions warning
37 4178 SIDECAR_KEY_RECOMMENDED NumberOfVolumesDiscardedByScanner warning
38 4178 SIDECAR_KEY_RECOMMENDED NumberOfVolumesDiscardedByUser warning
39 4178 SIDECAR_KEY_RECOMMENDED TaskDescription warning
40 4221 SIDECAR_KEY_RECOMMENDED StimulusPresentation warning
41 6065 SIDECAR_KEY_RECOMMENDED MultibandAccelerationFactor warning
42 10352 SIDECAR_KEY_RECOMMENDED SliceEncodingDirection warning
43 10474 SIDECAR_KEY_RECOMMENDED InversionTime warning
44 10474 SIDECAR_KEY_RECOMMENDED ParallelReductionFactorInPlane warning
45 10591 SIDECAR_KEY_RECOMMENDED CoilCombinationMethod warning
46 10591 SIDECAR_KEY_RECOMMENDED GradientSetType warning
47 10591 SIDECAR_KEY_RECOMMENDED MatrixCoilMode warning
48 10591 SIDECAR_KEY_RECOMMENDED MixingTime warning
49 10591 SIDECAR_KEY_RECOMMENDED MRTransmitCoilSequence warning
50 10591 SIDECAR_KEY_RECOMMENDED MTState warning
51 10591 SIDECAR_KEY_RECOMMENDED NumberShots warning
52 10591 SIDECAR_KEY_RECOMMENDED ParallelAcquisitionTechnique warning
53 10591 SIDECAR_KEY_RECOMMENDED ParallelReductionFactorOutOfPlane warning
54 10591 SIDECAR_KEY_RECOMMENDED PartialFourierDirection warning
55 10591 SIDECAR_KEY_RECOMMENDED PulseSequenceType warning
56 10591 SIDECAR_KEY_RECOMMENDED SpoilingState warning
57 14328 HED_WARNING SIDECAR_KEY_MISSING warning
As of this -- we have done everything possible to make our dataset worthy publishing publicly! And reviewers of Scientific Data demanded to get it published publicly due to errors and TODOs we used to have.
I am starting to wonder if I should succumb to the dark ways of the .bidsignore... please do not let me that dark path
I am starting to wonder if I should succumb to the dark ways of the
.bidsignore... please do not let me that dark path
Disabling the SIDECAR_KEY_RECOMMENDED warning allows this to save the validation output, due to running into the max size of the POST body at 50MB of JSON. I've done this manually now but we should see if we can address the JSON output size on the validator side somehow.
Awesome, thank you @nellh ! #3184
and we seems to have also lots of TSV_COLUMN_TYPE_REDEFINED -- someone might have been "too thorough".
we still have
with Last Updated 2025-09-08 - 16 days ago which forbids publishing a new version.
how could we get over the hump to be able to publish a new version?
Probably missing data:
# git annex list | grep -v ^X
here
|github
||s3-PUBLIC
|||web
||||bittorrent
|||||
_____ sub-0001/ses-04/func/sub-0001_ses-04_task-fractional_acq-mb8_run-01_bold.json
_____ sub-0019/ses-04/func/sub-0019_ses-04_task-fractional_acq-mb8_run-01_bold.json
_____ sub-0019/ses-04/func/sub-0019_ses-04_task-fractional_acq-mb8_run-02_bold.json
You may want to unannex those files and push them.
well -- those were annexed since "large" (over 100kb) and we had a rule
yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ tail -n 1 .gitattributes
* annex.largefiles=(largerthan=100kb)
and overall we had over a 1000 of json files added under git-annex for that reason (otherwise it would be about 140MB of json files alone to commit to git)
yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ find sub* -iname \*.json -type l | wc -l
1231
yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ find sub* -iname \*.json -type l | xargs du -scmL
...
142 total
Is that a requirement for all json files to be directly under git control to pass validation? (here I could force their committing into git, but ideally would prefer to avoid)
AFAIK they are all on openneuro as well:
yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ git annex find --not --in openneuro
yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$
I'm not sure what to do when your repo says we have it and ours says we don't. Can you fsck our remote?
I have done a few rounds of git annex copy on those files with the hopes they arrive and also did drop one of them and re-fetched from the openneuro remote
(deno) yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ git annex drop sub-0001/ses-04/func/sub-0001_ses-04_task-fractional_acq-mb8_run-01_bold.json
drop sub-0001/ses-04/func/sub-0001_ses-04_task-fractional_acq-mb8_run-01_bold.json (locking origin...) ok
(recording state in git...)
(deno) yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ git annex get --from openneuro sub-0001/ses-04/func/sub-0001_ses-04_task-fractional_acq-mb8_run-01_bold.json
get sub-0001/ses-04/func/sub-0001_ses-04_task-fractional_acq-mb8_run-01_bold.json (from openneuro...) ok
(recording state in git...)
we pushed some bogus commit (empty) and "Validation Pending is still there for 10 minutes. Could you please check on is it the same issue or smth new?
fscked those files and git annex found it. I'm not sure if I forgot to do that last time, but either way it looks like it's here.
The validator may be timing out, or it might just be in the task queue (@nellh?). I ran it manually with --ignoreWarnings and got these errors:
[ERROR] TSV_VALUE_INCORRECT_TYPE A value in a column did not match the acceptable type for that column headers specified format.
trial_type
/sub-0038/ses-01/func/sub-0038_ses-01_task-social_acq-mb8_run-04_events.tsv - 'stimulus'
/sub-0038/ses-01/func/sub-0038_ses-01_task-social_acq-mb8_run-02_events.tsv - 'stimulus'
1805 more files with the same issue
rating_glmslabel
/sub-0038/ses-01/func/sub-0038_ses-01_task-social_acq-mb8_run-04_events.tsv - 'Very-Strong'
/sub-0038/ses-01/func/sub-0038_ses-01_task-social_acq-mb8_run-02_events.tsv - 'Barely-detectable'
1493 more files with the same issue
rating_glmslabel_fillna
/sub-0038/ses-01/func/sub-0038_ses-01_task-social_acq-mb8_run-04_events.tsv - 'Very-Strong'
/sub-0038/ses-01/func/sub-0038_ses-01_task-social_acq-mb8_run-02_events.tsv - 'Barely-detectable'
1576 more files with the same issue
value
/sub-0038/ses-04/func/sub-0038_ses-04_task-fractional_acq-mb8_run-02_events.tsv - 'falsebelief'
/sub-0021/ses-04/func/sub-0021_ses-04_task-fractional_acq-mb8_run-01_events.tsv - 'falsebelief'
49 more files with the same issue
duration
/sub-0084/ses-02/func/sub-0084_ses-02_task-faces_acq-mb8_run-01_events.tsv - '-0.017'
/sub-0021/ses-02/func/sub-0021_ses-02_task-faces_acq-mb8_run-03_events.tsv - '-0.017'
200 more files with the same issue
event_type
/sub-0084/ses-04/func/sub-0084_ses-04_task-fractional_acq-mb8_run-01_events.tsv - 'how_hand'
/sub-0078/ses-04/func/sub-0078_ses-04_task-fractional_acq-mb8_run-01_events.tsv - 'how_hand'
46 more files with the same issue
Please visit https://neurostars.org/search?q=TSV_VALUE_INCORRECT_TYPE for existing conversations about this issue.
[ERROR] HED_ERROR The validation on this HED string returned an error.
VALUE_INVALID
/sub-0084/ses-04/func/sub-0084_ses-04_task-fractional_acq-mb8_run-01_events.tsv - ERROR: [VALUE_INVALID] Invalid placeholder value for tag "Organizational-property/Control-variable/reaching?". Tag "Control-variable" has value classes [nameClass] but has value "reaching?" is not in any of them. TSV line: "2". (For more information on this HED error, see https://hed-specification.readthedocs.io/en/latest/Appendix_B.html#value-invalid.)
/sub-0084/ses-04/func/sub-0084_ses-04_task-fractional_acq-mb8_run-01_events.tsv - ERROR: [VALUE_INVALID] Invalid placeholder value for tag "Organizational-property/Control-variable/reaching?". Tag "Control-variable" has value classes [nameClass] but has value "reaching?" is not in any of them. TSV line: "3". (For more information on this HED error, see https://hed-specification.readthedocs.io/en/latest/Appendix_B.html#value-invalid.)
13822 more files with the same issue
Please visit https://neurostars.org/search?q=HED_ERROR for existing conversations about this issue.