configlet icon indicating copy to clipboard operation
configlet copied to clipboard

lint: consider the JSON parsing/deserialization design

Open ee7 opened this issue 3 years ago • 14 comments

Main options:

  1. std/json 1a. The approach so far: parse into a JsonNode and work only with that. 1b. Parse into a JsonNode, then unmarshall into some object using to. 1c. Plus std/jsonutils
  2. Araq/packedjson - keeps everything as a string. Lower memory usage than std/json, and sometimes faster.
  3. planetis-m/eminim - deserializes using std/streams directly to an object. Doesn't fully support object variants, but maybe that isn't a problem for us.
  4. status-im/nim-json-serialization - deserializes using nim-faststreams directly to an object. Probably the most mature third-party option. Currently has a large dependency tree, including chronos and bearssl.
  5. treeform/jsony - deserializes from string directly to an object.

(Note that disruptek/jason is serialization-only).

There are also some more obscure ones that I haven't tried, and don't know anything about:

Some of the above are possibly too lenient or require special handling in some edge cases.

Summary:

Library Permits a trailing comma? Permits comment? Duplicate key handling
Ruby stdlib json :x: :white_check_mark: Uses last value
std/json :white_check_mark: :white_check_mark: Uses last value
std/json patched :x: :x: Uses last value
packedjson :white_check_mark: :white_check_mark: Uses first value
eminim :white_check_mark: :white_check_mark: Uses last value
json_serialization :x: :white_check_mark: Uses last value
jsony :white_check_mark: :x: Uses last value

For example:

  • There is no correct behaviour for a duplicate key - one library may produce an error, another may silently use the value of the first key, and another may silently use the value of the last key.
  • std/json permits a trailing comma, and comments with // and /* */. This is the main reason that it took a while to tick the boxes for "the file must be valid JSON" in https://github.com/exercism/configlet/issues/249. But we now have own patched std/json with stricter parsing. configlet lint must exit with a non-zero exit code for a trailing comma because the Ruby library that parses it later produces an error for a trailing comma.
  • Some libraries may use a default value when the key is missing, which we might want to distinguish from e.g. a value that is the empty string.
  • Edge cases around a literal null.

See also:

  • https://einarwh.wordpress.com/2020/05/08/on-the-complexity-of-json-serialization/
  • https://labs.bishopfox.com/tech-blog/an-exploration-of-json-interoperability-vulnerabilities

I'd suggest that jsony or nim-json-serialization might be best in the long-term. But maybe it's better to stick with the current approach until we've implemented all the linting rules, and refactor it later.

One advantage of the current approach is that it's more low-level, which might better ensure that we're "checking the JSON file itself" rather than "checking that each value is valid when parsed with library X".

ee7 avatar May 06 '21 14:05 ee7