openneuro icon indicating copy to clipboard operation
openneuro copied to clipboard

Validation of ds005256 never completes

Open yarikoptic opened this issue 11 months ago • 20 comments

https://openneuro.org/datasets/ds005256

I git pushed yesterday, but it is still

Image

as of today.

yarikoptic avatar Jan 28 '25 16:01 yarikoptic

How long does the validator take to run locally?

effigies avatar Jan 28 '25 16:01 effigies

We've clocked it, and it takes 15 minutes on our end. My best guess is that the HED tags are causing a slowdown.

effigies avatar Jan 28 '25 19:01 effigies

details -- about 5 minutes
(deno) yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ duct deno run --allow-read --allow-env --reload -A https://github.com/bids-standard/bids-validator/raw/deno-build/bids-validator.js --outfile logs/deno-bids-validator.log $PWD
2025-01-28T16:41:25-0500 [INFO    ] con-duct: duct is executing 'deno run --allow-read --allow-env --reload -A https://github.com/bids-standard/bids-validator/raw/deno-build/bids-validator.js --outfile logs/deno-bids-validator.log /mnt/DATA/data/yoh/1076_spacetop'...
2025-01-28T16:41:25-0500 [INFO    ] con-duct: Log files will be written to .duct/logs/2025.01.28T16.41.25-3688580_
Download https://github.com/bids-standard/bids-validator/raw/deno-build/bids-validator.js
Download https://raw.githubusercontent.com/bids-standard/bids-validator/deno-build/bids-validator.js
2025-01-28T16:46:17-0500 [INFO    ] con-duct: Summary:
Exit Code: 1
Command: deno run --allow-read --allow-env --reload -A https://github.com/bids-standard/bids-validator/raw/deno-build/bids-validator.js --outfile logs/deno-bids-validator.log /mnt/DATA/data/yoh/1076_spacetop
Log files location: .duct/logs/2025.01.28T16.41.25-3688580_
Wall Clock Time: 291.983 sec
Memory Peak Usage (RSS): 505.5 MB
Memory Average Usage (RSS): 438.6 MB
Virtual Memory Peak Usage (VSZ): 21.1 GB
Virtual Memory Average Usage (VSZ): 21.0 GB
Memory Peak Percentage: 0.0%
Memory Average Percentage: 0.0%
CPU Peak Usage: 164.0%
Average CPU Usage: 160.09090909090904%

(deno) yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ grep ERROR logs/deno-bids-validator.log
        [ERROR] PARTICIPANT_ID_MISMATCH Participant labels found in this dataset did not match the values in participant_id column
        [ERROR] HED_ERROR The validation on this HED string returned an error.
                /task-social_events.json - ERROR: [TAG_EXTENSION_INVALID] "Categorical-value" appears as "Property/Data-property/Data-value/Categorical-value" and cannot be used as an extension. Indices ([object Object], ). (For more information on this HED error, see https://hed-specification.readthedocs.io/en/latest/Appendix_B.html#tag-extension-invalid.)
                /task-social_events.json - ERROR: [TAG_EXTENSION_INVALID] "Categorical-value" appears as "Property/Data-property/Data-value/Categorical-value" and cannot be used as an extension. Indices ([object Object], ). (For more information on this HED error, see https://hed-specification.readthedocs.io/en/latest/Appendix_B.html#tag-extension-invalid.)
        Please visit https://neurostars.org/search?q=HED_ERROR for existing conversations about this issue.
        [ERROR] HED_INTERNAL_ERROR An internal error occurred during HED validation.
                GENERIC_ERROR
                /sub-0001/ses-02/func/sub-0001_ses-02_task-faces_acq-mb8_run-01_events.tsv - ERROR: [GENERIC_ERROR] Internal error - message: "Attempting to access the onset of a TSV row without one.". (For more information on this HED error, see https://hed-specification.readthedocs.io/en/latest/Appendix_B.html#generic-error.)
                /sub-0001/ses-02/func/sub-0001_ses-02_task-faces_acq-mb8_run-02_events.tsv - ERROR: [GENERIC_ERROR] Internal error - message: "Attempting to access the onset of a TSV row without one.". (For more information on this HED error, see https://hed-specification.readthedocs.io/en/latest/Appendix_B.html#generic-error.)
        Please visit https://neurostars.org/search?q=HED_INTERNAL_ERROR for existing conversations about this issue.
        [ERROR] STIMULUS_FILE_MISSING A stimulus file was declared but not found in the dataset.
(deno) yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ tail logs/deno-bids-validator.log

          Summary:                           Available Tasks:        Available Modalities:
          26373 Files, 2.23 TB               alignvideo              MRI                  
          118 - Subjects 4 - Sessions        faces                                        
                                             fractional                                   
                                             narratives                                   
                                             shortvideo                                   
                                             social                                       

        If you have any questions, please post on https://neurostars.org/tags/bids.

so took about 5 minutes... initial run which failed to save log due to missing lead dir took the same-ish.

yarikoptic avatar Jan 29 '25 01:01 yarikoptic

so is the problem that job times out and never reported completed?

yarikoptic avatar Jan 31 '25 17:01 yarikoptic

ping on this -- any way to get over the validation hump?

yarikoptic avatar Feb 20 '25 01:02 yarikoptic

Hopefully https://github.com/bids-standard/bids-validator/pull/156 produced some speedup. Running it now.

effigies avatar Feb 20 '25 02:02 effigies

Definitely shorter at 3m50s:

	[ERROR] HED_ERROR The validation on this HED string returned an error.
		None
		/task-fractional_events.json - TypeError: Cannot convert undefined or null to object
		/sub-0038/ses-04/func/sub-0038_ses-04_task-fractional_acq-mb8_run-01_events.tsv - TypeError: Cannot convert undefined or null to object

		203 more files with the same issue

		TAG_EXTENSION_INVALID
		/task-social_events.json - ERROR: [TAG_EXTENSION_INVALID] "Categorical-value" appears as "Property/Data-property/Data-value/Categorical-value" and cannot be used as an extension. Indices ([object Object], ). (For more information on this HED error, see https://hed-specification.readthedocs.io/en/latest/Appendix_B.html#tag-extension-invalid.)
		/task-social_events.json - ERROR: [TAG_EXTENSION_INVALID] "Categorical-value" appears as "Property/Data-property/Data-value/Categorical-value" and cannot be used as an extension. Indices ([object Object], ). (For more information on this HED error, see https://hed-specification.readthedocs.io/en/latest/Appendix_B.html#tag-extension-invalid.)

		9068 more files with the same issue

	Please visit https://neurostars.org/search?q=HED_ERROR for existing conversations about this issue.

	[ERROR] PARTICIPANT_ID_MISMATCH Participant labels found in this dataset did not match the values in participant_id column
found in the participants.tsv file.

		/participants.tsv

	Please visit https://neurostars.org/search?q=PARTICIPANT_ID_MISMATCH for existing conversations about this issue.

	[ERROR] STIMULUS_FILE_MISSING A stimulus file was declared but not found in the dataset.

		/sub-0038/ses-02/func/sub-0038_ses-02_task-narratives_acq-mb8_run-03_events.tsv
		/sub-0084/ses-03/func/sub-0084_ses-03_task-alignvideo_acq-mb8_run-01_events.tsv

		10 more files with the same issue

	Please visit https://neurostars.org/search?q=STIMULUS_FILE_MISSING for existing conversations about this issue.

	[ERROR] HED_INTERNAL_ERROR An internal error occurred during HED validation.
		GENERIC_ERROR
		/sub-0038/ses-02/func/sub-0038_ses-02_task-narratives_acq-mb8_run-03_events.tsv - ERROR: [GENERIC_ERROR] Internal error - message: "Attempting to access the onset of a TSV row without one.". (For more information on this HED error, see https://hed-specification.readthedocs.io/en/latest/Appendix_B.html#generic-error.)
		/sub-0084/ses-02/func/sub-0084_ses-02_task-faces_acq-mb8_run-01_events.tsv - ERROR: [GENERIC_ERROR] Internal error - message: "Attempting to access the onset of a TSV row without one.". (For more information on this HED error, see https://hed-specification.readthedocs.io/en/latest/Appendix_B.html#generic-error.)

		325 more files with the same issue

	Please visit https://neurostars.org/search?q=HED_INTERNAL_ERROR for existing conversations about this issue.

That said, it's still not showing as completed on the draft page after I retriggered validation. I don't know why.

effigies avatar Feb 20 '25 02:02 effigies

@nellh any ideas on this issue of non-completion validation? FWIW, I had pushed a few more times , and it never finished spinning. May be it relates to the fact that there is LOTS of files/warnings for them. E.g. here is the annexed json output https://github.com/spatialtopology/ds005256/blob/master/derivatives/bids-validator/deno-bids-validator.json (let me know if I should share it somewhere) and annex key says it is 100MB in size!

yarikoptic avatar Feb 24 '25 17:02 yarikoptic

@nellh any ideas on this issue of non-completion validation? FWIW, I had pushed a few more times , and it never finished spinning. May be it relates to the fact that there is LOTS of files/warnings for them. E.g. here is the annexed json output https://github.com/spatialtopology/ds005256/blob/master/derivatives/bids-validator/deno-bids-validator.json (let me know if I should share it somewhere) and annex key says it is 100MB in size!

The next release of OpenNeuro (planned for this week) will include @bids/validator 2.0.3 with several fixes here. We do limit warnings to 50k on OpenNeuro now to avoid the issues previously with very verbose warning datasets running into size limits for the output.

nellh avatar Feb 24 '25 19:02 nellh

  • limits are good, but still there should be some feedback then to the user, like "too many warnings/errors detected, run validator locally"
  • @effigies also mentioned that this dataset succumbed to some "poisoned" json output issue which was addressed too.

I see website says it is 4.32.0 which was released yesterday, so I guess should have addressed above... I pushed a new commit (thank you codespell for always finding something to fix ;) ) -- and "Validation Pending" seems to persist.

yarikoptic avatar Feb 27 '25 19:02 yarikoptic

ok, validator is ERROR is free now! There is still 100s of thousands of warnings since it is a very good rich dataset with lots of good files

here is a summary over those 50ish types of warnings with counts
(deno) yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ jq -r '.issues.issues[] | "\(.code) \(.subCode) \(.severity)"' derivatives/bids-validator/deno-bids-validator.json | sort | uniq -c | sort -n | nl
     1        1 JSON_KEY_RECOMMENDED SourceDatasets warning
     2        5 SIDECAR_KEY_RECOMMENDED SliceTiming warning
     3        5 SIDECAR_KEY_RECOMMENDED TaskName warning
     4        9 SUSPICIOUSLY_LONG_EVENT_DESIGN null warning
     5       17 SUSPICIOUSLY_SHORT_EVENT_DESIGN null warning
     6       48 TSV_ADDITIONAL_COLUMNS_UNDEFINED question warning
     7      122 SIDECAR_KEY_RECOMMENDED DeviceSerialNumber warning
     8      122 SIDECAR_KEY_RECOMMENDED DwellTime warning
     9      122 SIDECAR_KEY_RECOMMENDED EchoTime warning
    10      122 SIDECAR_KEY_RECOMMENDED FlipAngle warning
    11      122 SIDECAR_KEY_RECOMMENDED InstitutionAddress warning
    12      122 SIDECAR_KEY_RECOMMENDED InstitutionalDepartmentName warning
    13      122 SIDECAR_KEY_RECOMMENDED MagneticFieldStrength warning
    14      122 SIDECAR_KEY_RECOMMENDED ManufacturersModelName warning
    15      122 SIDECAR_KEY_RECOMMENDED Manufacturer warning
    16      122 SIDECAR_KEY_RECOMMENDED PartialFourier warning
    17      122 SIDECAR_KEY_RECOMMENDED PulseSequenceDetails warning
    18      122 SIDECAR_KEY_RECOMMENDED ReceiveCoilActiveElements warning
    19      122 SIDECAR_KEY_RECOMMENDED ReceiveCoilName warning
    20      122 SIDECAR_KEY_RECOMMENDED ScanningSequence warning
    21      122 SIDECAR_KEY_RECOMMENDED ScanOptions warning
    22      122 SIDECAR_KEY_RECOMMENDED SequenceName warning
    23      122 SIDECAR_KEY_RECOMMENDED SequenceVariant warning
    24      122 SIDECAR_KEY_RECOMMENDED SoftwareVersions warning
    25      122 SIDECAR_KEY_RECOMMENDED StationName warning
    26      127 SIDECAR_KEY_RECOMMENDED InstitutionName warning
    27      239 SIDECAR_KEY_RECOMMENDED EffectiveEchoSpacing warning
    28      239 SIDECAR_KEY_RECOMMENDED PhaseEncodingDirection warning
    29      239 SIDECAR_KEY_RECOMMENDED TotalReadoutTime warning
    30     1813 TSV_COLUMN_TYPE_REDEFINED duration warning
    31     1969 SIDECAR_KEY_RECOMMENDED B0FieldIdentifier warning
    32     4178 SIDECAR_KEY_RECOMMENDED AcquisitionDuration warning
    33     4178 SIDECAR_KEY_RECOMMENDED CogPOID warning
    34     4178 SIDECAR_KEY_RECOMMENDED DelayAfterTrigger warning
    35     4178 SIDECAR_KEY_RECOMMENDED DelayTime warning
    36     4178 SIDECAR_KEY_RECOMMENDED Instructions warning
    37     4178 SIDECAR_KEY_RECOMMENDED NumberOfVolumesDiscardedByScanner warning
    38     4178 SIDECAR_KEY_RECOMMENDED NumberOfVolumesDiscardedByUser warning
    39     4178 SIDECAR_KEY_RECOMMENDED TaskDescription warning
    40     4221 SIDECAR_KEY_RECOMMENDED StimulusPresentation warning
    41     6065 SIDECAR_KEY_RECOMMENDED MultibandAccelerationFactor warning
    42    10352 SIDECAR_KEY_RECOMMENDED SliceEncodingDirection warning
    43    10474 SIDECAR_KEY_RECOMMENDED InversionTime warning
    44    10474 SIDECAR_KEY_RECOMMENDED ParallelReductionFactorInPlane warning
    45    10591 SIDECAR_KEY_RECOMMENDED CoilCombinationMethod warning
    46    10591 SIDECAR_KEY_RECOMMENDED GradientSetType warning
    47    10591 SIDECAR_KEY_RECOMMENDED MatrixCoilMode warning
    48    10591 SIDECAR_KEY_RECOMMENDED MixingTime warning
    49    10591 SIDECAR_KEY_RECOMMENDED MRTransmitCoilSequence warning
    50    10591 SIDECAR_KEY_RECOMMENDED MTState warning
    51    10591 SIDECAR_KEY_RECOMMENDED NumberShots warning
    52    10591 SIDECAR_KEY_RECOMMENDED ParallelAcquisitionTechnique warning
    53    10591 SIDECAR_KEY_RECOMMENDED ParallelReductionFactorOutOfPlane warning
    54    10591 SIDECAR_KEY_RECOMMENDED PartialFourierDirection warning
    55    10591 SIDECAR_KEY_RECOMMENDED PulseSequenceType warning
    56    10591 SIDECAR_KEY_RECOMMENDED SpoilingState warning
    57    14328 HED_WARNING SIDECAR_KEY_MISSING warning

As of this -- we have done everything possible to make our dataset worthy publishing publicly! And reviewers of Scientific Data demanded to get it published publicly due to errors and TODOs we used to have.

yarikoptic avatar Mar 07 '25 21:03 yarikoptic

I am starting to wonder if I should succumb to the dark ways of the .bidsignore... please do not let me that dark path

yarikoptic avatar Mar 13 '25 16:03 yarikoptic

I am starting to wonder if I should succumb to the dark ways of the .bidsignore... please do not let me that dark path

Disabling the SIDECAR_KEY_RECOMMENDED warning allows this to save the validation output, due to running into the max size of the POST body at 50MB of JSON. I've done this manually now but we should see if we can address the JSON output size on the validator side somehow.

nellh avatar Mar 13 '25 20:03 nellh

Awesome, thank you @nellh ! #3184

and we seems to have also lots of TSV_COLUMN_TYPE_REDEFINED -- someone might have been "too thorough".

Image

yarikoptic avatar Mar 13 '25 21:03 yarikoptic

we still have Image

with Last Updated 2025-09-08 - 16 days ago which forbids publishing a new version.

how could we get over the hump to be able to publish a new version?

yarikoptic avatar Sep 24 '25 17:09 yarikoptic

Probably missing data:

# git annex list | grep -v ^X
here
|github
||s3-PUBLIC
|||web
||||bittorrent
|||||
_____ sub-0001/ses-04/func/sub-0001_ses-04_task-fractional_acq-mb8_run-01_bold.json
_____ sub-0019/ses-04/func/sub-0019_ses-04_task-fractional_acq-mb8_run-01_bold.json
_____ sub-0019/ses-04/func/sub-0019_ses-04_task-fractional_acq-mb8_run-02_bold.json

You may want to unannex those files and push them.

effigies avatar Sep 25 '25 14:09 effigies

well -- those were annexed since "large" (over 100kb) and we had a rule

yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ tail -n 1 .gitattributes
* annex.largefiles=(largerthan=100kb)

and overall we had over a 1000 of json files added under git-annex for that reason (otherwise it would be about 140MB of json files alone to commit to git)

yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ find sub* -iname \*.json -type l | wc -l
1231

yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ find sub* -iname \*.json -type l | xargs du -scmL
...
142     total

Is that a requirement for all json files to be directly under git control to pass validation? (here I could force their committing into git, but ideally would prefer to avoid)

AFAIK they are all on openneuro as well:

yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ git annex find --not --in openneuro
yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ 

yarikoptic avatar Sep 27 '25 22:09 yarikoptic

I'm not sure what to do when your repo says we have it and ours says we don't. Can you fsck our remote?

effigies avatar Sep 28 '25 01:09 effigies

I have done a few rounds of git annex copy on those files with the hopes they arrive and also did drop one of them and re-fetched from the openneuro remote

(deno) yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ git annex drop sub-0001/ses-04/func/sub-0001_ses-04_task-fractional_acq-mb8_run-01_bold.json
drop sub-0001/ses-04/func/sub-0001_ses-04_task-fractional_acq-mb8_run-01_bold.json (locking origin...) ok
(recording state in git...)
(deno) yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ git annex get --from openneuro sub-0001/ses-04/func/sub-0001_ses-04_task-fractional_acq-mb8_run-01_bold.json
get sub-0001/ses-04/func/sub-0001_ses-04_task-fractional_acq-mb8_run-01_bold.json (from openneuro...) ok
(recording state in git...)

we pushed some bogus commit (empty) and "Validation Pending is still there for 10 minutes. Could you please check on is it the same issue or smth new?

yarikoptic avatar Nov 17 '25 19:11 yarikoptic

fscked those files and git annex found it. I'm not sure if I forgot to do that last time, but either way it looks like it's here.

The validator may be timing out, or it might just be in the task queue (@nellh?). I ran it manually with --ignoreWarnings and got these errors:

	[ERROR] TSV_VALUE_INCORRECT_TYPE A value in a column did not match the acceptable type for that column headers specified format.
		trial_type
		/sub-0038/ses-01/func/sub-0038_ses-01_task-social_acq-mb8_run-04_events.tsv - 'stimulus'
		/sub-0038/ses-01/func/sub-0038_ses-01_task-social_acq-mb8_run-02_events.tsv - 'stimulus'

		1805 more files with the same issue

		rating_glmslabel
		/sub-0038/ses-01/func/sub-0038_ses-01_task-social_acq-mb8_run-04_events.tsv - 'Very-Strong'
		/sub-0038/ses-01/func/sub-0038_ses-01_task-social_acq-mb8_run-02_events.tsv - 'Barely-detectable'

		1493 more files with the same issue

		rating_glmslabel_fillna
		/sub-0038/ses-01/func/sub-0038_ses-01_task-social_acq-mb8_run-04_events.tsv - 'Very-Strong'
		/sub-0038/ses-01/func/sub-0038_ses-01_task-social_acq-mb8_run-02_events.tsv - 'Barely-detectable'

		1576 more files with the same issue

		value
		/sub-0038/ses-04/func/sub-0038_ses-04_task-fractional_acq-mb8_run-02_events.tsv - 'falsebelief'
		/sub-0021/ses-04/func/sub-0021_ses-04_task-fractional_acq-mb8_run-01_events.tsv - 'falsebelief'

		49 more files with the same issue

		duration
		/sub-0084/ses-02/func/sub-0084_ses-02_task-faces_acq-mb8_run-01_events.tsv - '-0.017'
		/sub-0021/ses-02/func/sub-0021_ses-02_task-faces_acq-mb8_run-03_events.tsv - '-0.017'

		200 more files with the same issue

		event_type
		/sub-0084/ses-04/func/sub-0084_ses-04_task-fractional_acq-mb8_run-01_events.tsv - 'how_hand'
		/sub-0078/ses-04/func/sub-0078_ses-04_task-fractional_acq-mb8_run-01_events.tsv - 'how_hand'

		46 more files with the same issue

	Please visit https://neurostars.org/search?q=TSV_VALUE_INCORRECT_TYPE for existing conversations about this issue.

	[ERROR] HED_ERROR The validation on this HED string returned an error.
		VALUE_INVALID
		/sub-0084/ses-04/func/sub-0084_ses-04_task-fractional_acq-mb8_run-01_events.tsv - ERROR: [VALUE_INVALID] Invalid placeholder value for tag "Organizational-property/Control-variable/reaching?". Tag "Control-variable" has value classes [nameClass] but has value "reaching?" is not in any of them. TSV line: "2". (For more information on this HED error, see https://hed-specification.readthedocs.io/en/latest/Appendix_B.html#value-invalid.)
		/sub-0084/ses-04/func/sub-0084_ses-04_task-fractional_acq-mb8_run-01_events.tsv - ERROR: [VALUE_INVALID] Invalid placeholder value for tag "Organizational-property/Control-variable/reaching?". Tag "Control-variable" has value classes [nameClass] but has value "reaching?" is not in any of them. TSV line: "3". (For more information on this HED error, see https://hed-specification.readthedocs.io/en/latest/Appendix_B.html#value-invalid.)

		13822 more files with the same issue

	Please visit https://neurostars.org/search?q=HED_ERROR for existing conversations about this issue.

effigies avatar Nov 17 '25 20:11 effigies