bids-validator
bids-validator copied to clipboard
Validation crashes with code 0: internal error when JSON files begin with UTF-8 BOM
The validator should ignore the UTF-8 BOM at the start of JSON files to avoid crashing on parsing. Looks like this affects any JSON file but the specific case found on OpenNuero was for dataset_description.json
To reproduce just prepend 0xbbef to any dataset_description.json
bids-validator output:
1: [ERR] Internal error. SOME VALIDATION STEPS MAY NOT HAVE OCCURRED (code: 0 - INTERNAL ERROR)
.undefined
Evidence: TypeError: Cannot read property 'Authors' of null
at checkDatasetDescription (/home/nell/Git/crn/bids-validator/bids-validator/validators/bids/checkDatasetDescription.js:19:64)
at /home/nell/Git/crn/bids-validator/bids-validator/validators/bids/fullTest.js:135:40
Please visit https://neurostars.org/search?q=INTERNAL ERROR for existing conversations about this issue.
The actual parse error that gets eaten:
SyntaxError: Unexpected token in JSON at position 0
at JSON.parse (<anonymous>)
at /home/nell/Git/crn/bids-validator/bids-validator/utils/json.js:13:20
at new Promise (<anonymous>)
at Object.parse (/home/nell/Git/crn/bids-validator/bids-validator/utils/json.js:9:10)
at /home/nell/Git/crn/bids-validator/bids-validator/validators/json/load.js:17:36
xref #878
cc @hoechenberger who just recently taught me about this
The validator should ignore the UTF-8 BOM at the start of JSON files to avoid crashing on parsing
The BOM is an illegal character in JSON – and unnecessary, as JSON is by definition UTF-8-encoded. If the BOM is present, the input data is not valid JSON. Hence, it's okay if actually imperative that the validator spits an error. It shouldn't crash, though.
I did not know that, but this thread agrees: https://stackoverflow.com/questions/4990095/json-specification-and-usage-of-bom-charset-encoding
From RFC 8259:
Implementations MUST NOT add a byte order mark (U+FEFF) to the beginning of a networked-transmitted JSON text. In the interests of interoperability, implementations that parse JSON texts MAY ignore the presence of a byte order mark rather than treating it as an error.
I still agree that it should be an error, not a crash, but it is not imperative in the spec that it be an error. It will just make life difficult for users to get a green light from the validator when any app may crash on it.
I agree it should not crash, but raise a more helpful error message about a potentially invalid JSON file. But in any case we should stop there and fail hard.