"n/a" for required tag generates "Invalid JSON file"; "else" error?
Using the validator to check a publically available PET dataset I got a bunch of "Error 1: [Code 55] JSON_SCHEMA_VALIDATION_ERROR" errors that were generated by "should be number" evidence, however, examining the flagged JSON files I found that the 'invalid' values were n/a.
The spec says
Throughout BIDS you can indicate missing values with n/a (for "not available").
Now, as it happens the errors are generated on required tags, and perhaps the spec should be clearer that "REQUIRED" isn't satisifed by supplying a n/a value.
But for the validator, it would be more informative if, when a required tag has a value but that value is "n/a", that the error reflects that (e.g. "Required tag set to n/a value"), instead of treating it as a generic low-level error.
Also, among this dataset's other errors, there is one "Invalid JASON error" that has evidence that I can't parse:
should match "else" schema
I googled around but couldn't figure what this was supposed to mean... is it just a validator error?
Dataset is: https://www.nitrc.org/frs/download.php/12247/Dataset_BIDS.zip Full log attached
Perhaps the spec should be clearer that "REQUIRED" isn't satisifed by supplying a n/a value.
I agree that such a clarification would not hurt :+1:
But for the validator, it would be more informative if, when a required tag has a value but that value is "n/a", that the error reflects that (e.g. "Required tag set to n/a value"), instead of treating it as a generic low-level error.
I think that's a good suggestion :+1: However, that ties in with another feature that we have wanted for a long time now (e.g., #1073) --> We use https://json-schema.org/ to validate BIDS metadata, so we specify a "structure", datatype, or pattern in a JSON file, and then get a BIDS JSON file, and compare the two (the schema, and the data). Unfortunately this does not really allow us to easily customize warnings or inject custom code in the right spots. It just tells us where the data and the schema don't match. We would want some code in between that analyzes the type of mismatch, and then triages this type into a set of specific warnings or hints.
I am convinced it's somehow possible, but it's not an inbuilt feature and some level of expertise would be required to get this done. (But the impact would be big and it would be awesome).
Also, among this dataset's other errors, there is one "Invalid JASON error" that has evidence that I can't parse:
That's a JSON schema problem one more :thinking: but it looks a bit tricky.
Type: Error
File: sub-00000016_ses-Normalized_pet.json
Location: dataset_bids/sub-00000016/ses-Normalized/pet/sub-00000016_ses-Normalized_pet.json
Reason: Invalid JSON file. The file is not formatted according the schema.
Evidence: .ReconMethodParameterValues[0] should be number
This error is straight forward, we can debug it by looking at the specification for ReconMethodParameterValues in the schema for PET: https://github.com/bids-standard/bids-validator/blob/e84e006adc7fbf548a95f6c40973603f8ae9cd07/bids-validator/validators/json/schemas/pet.json#L83
However, in your log I see some error messages that do not say anything about the JSON KEY (like ReconMethodParameterValues above) that triggered the problem. I wonder why ... :shrug:
Type: Error
File: sub-00000016_ses-Normalized_pet.json
Location: dataset_bids/sub-00000016/ses-Normalized/pet/sub-00000016_ses-Normalized_pet.json
Reason: Invalid JSON file. The file is not formatted according the schema.
Evidence: should have required property 'ReconFilterSize'
Now finally, the problem above also applies to your should match "else" schema issue:
Type: Error
File: sub-00000016_ses-Normalized_pet.json
Location: dataset_bids/sub-00000016/ses-Normalized/pet/sub-00000016_ses-Normalized_pet.json
Reason: Invalid JSON file. The file is not formatted according the schema.
Evidence: should match "else" schema
It doesn't tell us the JSON KEY that triggered the problem. But the way the "evidence" is made up hints at a schema that uses an if-else syntax. So you might be able to debug this by looking at this part of the schema: https://github.com/bids-standard/bids-validator/blob/e84e006adc7fbf548a95f6c40973603f8ae9cd07/bids-validator/validators/json/schemas/pet.json#L128-L195
So overall: I think you made two good suggestions, the first one can be more or less easily implemented in the BIDS specification, the second one is a hard problem that we want to eventually solve for the validator: But we lack the right people for it currently (JS experience, validator experience, lots of time and enthusiasm for these technical issues)
As for debugging, I hope my info was sufficient. Worst case, I can take another look - just let me know.
Thanks @sappelhoff! This is very enlightening... using a schemea and generic validation tools makes sense, but it is frustrating how limiting it then is for giving nuanced feedback. Anyway, it sounds like these limitations are well-understood, which is all I wanted to ensure.
I'll make a PR for the first point.
We are having issues creating PET bids dataset, due to the same reason that @sappelhoff mentioned. The PET data was acquired with FDG, so the certain fields, such as InjectedMass, InjectedMassUnits, SpecifiedRadioactivity and SpecifiedRadioactivityUnits are not availble, according to the current PET BIDS specs. However, setting them to "n/a" raised errors in the validator. In this case, the fields are 'required' but it raises errors if setting as "n/a". What could be the solution before the PR is patched to the validator?
@szho42 you can check the expected values for PET in these two files:
- https://github.com/bids-standard/bids-validator/blob/master/bids-validator/validators/json/schemas/pet.json
- https://github.com/bids-standard/bids-validator/blob/master/bids-validator/validators/json/schemas/pet_blood.json
If you need to make progress fast now and cannot wait for a PR to fix this problem, I think you could try to set the values to unreasonable dummy values that will deliberately break any downstream analysis, and then document this really well in the README file. AND of course, go back and update the dataset once we have fixed the problem that some fields are REQUIRED but some systems simply don't provide/need this REQUIRED data.
cc @mnoergaard do you have an opinion on this?
I think this is a good suggestion @sappelhoff setting the values to an unreasonable dummy value like e.g. -1 as a quick fix.
I will look more into this and try to come up with a solution asap. CC'ing @rwblair in case he might have some input.
@szho42 Would you be willing to submit an example dataset to https://github.com/bids-standard/bids-examples/? By having examples that cover as many cases that are allowed by the spec, we can check that the validator is not rejecting good datasets.
@sappelhoff @mnoergaard yeah, we currently adopted the same solution, by setting the fields with values of 0s, such as: "InjectedMass": 0, "InjectedMassUnits": "ug", "SpecificRadioactivity": 0, "SpecificRadioactivityUnits": "Bq/g"
yeah, value of -1 might be more suitable in this case.
@effigies yeah, I will prepare a PR with an PET example.