JSON.jl
JSON.jl copied to clipboard
Comment support
It would be nice if comments in json files would be supported so a file like this:
{
// We choose this awesome value
"a": "value for key A"
}
Would generate a dictionary like:
Dict("a" => "value for key A")
Alternatively we could support the JSON5 standard
JSON does not have comments
+1 for JSON5 standard!
Pretty good AI-generated report of what would be required to support JSON5:
JSON5 Support Report
Notable JSON5 Differences
- JSON5 treats // single-line comments, /* … */ block comments, and extra ECMAScript whitespace characters (vertical tab, form feed, non-breaking space,
U+2028/2029, byte-order mark) as ignorable syntax, unlike strict JSON. citeturn2view0
- Object member names may be unquoted identifier names or single-quoted strings, and object/array literals may end with trailing commas.
citeturn2view0
- Strings may be single- or double-quoted, support line continuations via \ followed by a line terminator, and add escapes such as \0, \v, and \xHH.
citeturn2view0
- Numbers accept leading +, bare Infinity/-Infinity/NaN, hexadecimal (0x), octal (0o), binary (0b), and leading or trailing decimal points (e.g., .5,
10.). citeturn2view0
- Because JSON5 is a superset of JSON, the writer can continue emitting strict JSON (always valid JSON5) but exposing JSON5-friendly output (comments,
relaxed strings/numbers) would require explicit opt-in. citeturn1search2
Parser Touchpoints (src/lazy.jl, src/utils.jl, src/parse.jl)
- Whitespace & comments: @nextbyte only skips ASCII space, tab, CR, LF and never recognizes / comment prefixes (src/utils.jl:61-78). Every caller—
including the root scan in lazy (src/lazy.jl:91-120), object/array loops (src/lazy.jl:262-344), and checkendpos (src/parse.jl:217-229)—would need
generalized skipping that also collapses JSON5 comment bodies and additional whitespace code points. The jsonlines-specific macro (src/lazy.jl:309-
345) likewise hardcodes the ASCII set and will choke on comment lines or NBSP delimiters.
- Object keys & trailing commas: applyobject always calls parsestring for a key (src/lazy.jl:262-289), so unquoted identifiers and single-quoted keys
currently error. This block also assumes the next token after a value must be } or , and then immediately expects another key, so trailing commas fail
(src/lazy.jl:296-311). Similar logic in applyarray rejects trailing commas (src/lazy.jl:333-344). Supporting JSON5 requires: (1) a new parseidentifier
helper that consumes IdentifierName (including escape sequences) and feeds a PtrString; (2) branching in applyobject based on the leading byte (" vs '
vs identifier start); and (3) logic to permit a comma before the closing delimiter.
- String literals: parsestring enforces " as the opening delimiter and treats any control character (including line breaks) as invalid (src/
lazy.jl:462-491). JSON5 needs the ability to start on ', accept \ line continuations, and allow the additional escape sequences. On the decode path,
unsafe_unescape_to_buffer plus reverseescapechar only understand the JSON escape set (src/utils.jl:97-205), so they must be extended for \0, \v, \x,
etc. On the encode path, _string always emits double quotes and ESCAPECHARS does not produce \' escapes, so you may keep this for strict JSON output
or add a JSON5 flag that switches quoting strategies (src/write.jl:653-675, src/utils.jl:83-134).
- Number grammar & literals: _lazy only classifies a value as NUMBER when it sees - or a digit, unless allownan is set (which was intended solely for
NaN/Inf parsing) (src/lazy.jl:205-236). JSON5’s additional prefixes (+, 0x, 0o, 0b, bare Infinity/NaN) require expanding this detection. The actual
parser, parsenumber, assumes JSON’s decimal grammar, rejects leading zeros, and only calls the relaxed @check_special paths when allownan is true
(src/lazy.jl:547-645). You’ll need either a dedicated JSON5 grammar (possibly leveraging Parsers.jl in a different mode) or multiple fast paths: one
for binary/octal/hex ints, one for legacy octal (if desired), and one for decimal forms that allow prefixed + or missing leading digits. Pay attention
to how results are tagged (NumberResult) so downstream materialization still distinguishes ints from floats/bigints.
- Root validation & options plumbing: LazyOptions and WriteOptions do not have a way to signal “JSON5 mode” (src/lazy.jl:73-83, src/write.jl:380-395).
Introducing such a flag lets you guard the new behavior while keeping strict JSON as the default. Remember to thread it through JSON.lazy, JSON.parse,
and JSON.parsefile so downstream APIs (e.g., StructUtils materialization, JSON.isvalidjson) inherit the relaxed grammar.
- jsonlines: Because JSON5 allows trailing commas and comments even in array contexts, the jsonlines implicit-array logic (src/lazy.jl:309-345 and the
jsonlines branch of applyarray, src/lazy.jl:320-345) must also honor the same delimiter relaxations; otherwise mixed workloads (e.g., log streams with
inline comments) will fail.
Writer Touchpoints (src/write.jl)
- Keys & structural delimiters: checkkey enforces that lowered keys are AbstractString and _string always writes double-quoted keys (src/write.jl:500-
536, src/write.jl:653-675). That’s valid JSON5, but if you’d like to emit unquoted identifiers when possible, you’d need logic to validate
IdentifierNames and bypass quoting. Supporting optional trailing commas would require changes in WriteClosure where commas are emitted eagerly and
later overwritten (src/write.jl:502-554, 625-634).
- Numbers and specials: _number refuses NaN/Inf unless allownan=true and never emits Infinity, NaN, or alternative radices by default (src/write.jl:699-
735). JSON5 mode should flip the default, mapping Inf/-Inf/NaN to the bare identifiers the spec prescribes and optionally serializing integers using
0x, 0o, or 0b if the caller opts in.
- Strings & escapes: _string has no notion of single quotes or line continuations. If you plan to preserve input style when round-tripping JSON5 (e.g.,
JSON.parse → JSON.json), you’ll need metadata to remember whether a string was single-quoted/multiline. Otherwise, emitting strict JSON is still
compliant.
- Option surface: Exposing a mode=:json5 (or similar) keyword on JSON.json, json!, and WriteOptions allows you to switch defaults (e.g., allownan=true,
allow trailing commas/comments) without breaking current users. That flag should also control pretty-printer spacing so it can emit comment-aware
formatting if desired.
Implementation Considerations
1. Define a json5::Bool in both LazyOptions and WriteOptions, with API keywords (JSON.parse(...; json5=true), JSON.json(...; json5=true)) and propagate
it through StructUtils integrations.
2. Implement tokenizer-level support for comments, extended whitespace, identifiers, and literals. Updating @nextbyte, adding helper scanners
(skip_comment!, parse_identifier!, parse_radix_number!), and teaching _lazy/parsenumber about the broader grammar centralizes the behavior so both
lazy selection and full materialization benefit.
3. Extend the writer and tests: enable JSON5-friendly serialization in _number, consider optional identifier keys/trailing commas, and add regression
tests that cover every JSON5 feature (comments, trailing commas, single-quoted strings, multi-radix numbers, extended whitespace) for both parsing
and (if supported) serialization.