bids-validator icon indicating copy to clipboard operation
bids-validator copied to clipboard

Validation crashes with code 0: internal error when JSON files begin with UTF-8 BOM

Open nellh opened this issue 4 years ago • 6 comments

The validator should ignore the UTF-8 BOM at the start of JSON files to avoid crashing on parsing. Looks like this affects any JSON file but the specific case found on OpenNuero was for dataset_description.json

To reproduce just prepend 0xbbef to any dataset_description.json

bids-validator output:

	1: [ERR] Internal error. SOME VALIDATION STEPS MAY NOT HAVE OCCURRED (code: 0 - INTERNAL ERROR)
		.undefined
			Evidence: TypeError: Cannot read property 'Authors' of null
    at checkDatasetDescription (/home/nell/Git/crn/bids-validator/bids-validator/validators/bids/checkDatasetDescription.js:19:64)
    at /home/nell/Git/crn/bids-validator/bids-validator/validators/bids/fullTest.js:135:40

	Please visit https://neurostars.org/search?q=INTERNAL ERROR for existing conversations about this issue.

The actual parse error that gets eaten:

SyntaxError: Unexpected token  in JSON at position 0
    at JSON.parse (<anonymous>)
    at /home/nell/Git/crn/bids-validator/bids-validator/utils/json.js:13:20
    at new Promise (<anonymous>)
    at Object.parse (/home/nell/Git/crn/bids-validator/bids-validator/utils/json.js:9:10)
    at /home/nell/Git/crn/bids-validator/bids-validator/validators/json/load.js:17:36

nellh avatar Feb 16 '21 23:02 nellh

xref #878

effigies avatar Feb 17 '21 00:02 effigies

cc @hoechenberger who just recently taught me about this

sappelhoff avatar Feb 17 '21 08:02 sappelhoff

The validator should ignore the UTF-8 BOM at the start of JSON files to avoid crashing on parsing

The BOM is an illegal character in JSON – and unnecessary, as JSON is by definition UTF-8-encoded. If the BOM is present, the input data is not valid JSON. Hence, it's okay if actually imperative that the validator spits an error. It shouldn't crash, though.

hoechenberger avatar Feb 17 '21 09:02 hoechenberger

I did not know that, but this thread agrees: https://stackoverflow.com/questions/4990095/json-specification-and-usage-of-bom-charset-encoding

effigies avatar Feb 17 '21 12:02 effigies

From RFC 8259:

Implementations MUST NOT add a byte order mark (U+FEFF) to the beginning of a networked-transmitted JSON text. In the interests of interoperability, implementations that parse JSON texts MAY ignore the presence of a byte order mark rather than treating it as an error.

I still agree that it should be an error, not a crash, but it is not imperative in the spec that it be an error. It will just make life difficult for users to get a green light from the validator when any app may crash on it.

effigies avatar Nov 28 '22 16:11 effigies

I agree it should not crash, but raise a more helpful error message about a potentially invalid JSON file. But in any case we should stop there and fail hard.

hoechenberger avatar Nov 28 '22 17:11 hoechenberger