json.h icon indicating copy to clipboard operation
json.h copied to clipboard

Parsing and serialization of control characters

Open j-moeller opened this issue 1 year ago • 5 comments

Hello,

we found json.h to parse and serialize control characters below 0x20 which technically is in violation of the JSON grammar. We collected a minimum working sample here.

j-moeller avatar Jul 03 '24 08:07 j-moeller

Cannot see the sample (it 404's for me). Can you point me at the offending JSON grammar language that the lib is violating by any chance? Happy to have this fixed, but just wanna know where it says!

sheredom avatar Jul 03 '24 20:07 sheredom

Sorry, the repository was still set to "private". It should be public now.

I am referencing Section 7 "Strings" from (https://datatracker.ietf.org/doc/html/rfc8259):

All Unicode characters may be placed within the quotation marks, except for the characters that MUST be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).

If my understanding of this is correct, the control characters below U+001F must be passed as "\u0000" - "\u001f" to be valid JSON (except for U+0008, U+000C, U+000A, U+000D, U+0009 which may also be passed as "\b", "\f", "\n", "\r", "\t").

json.h behaves as follows:

These are expected to return parsing errors:

  • 0x01 - 0x07 are parsed and serialized back to 0x01 - 0x07
  • 0x08 is parsed and serialized back to "\u0008"
  • 0x0b is parsed and serialized back to 0x0b
  • 0x0c is parsed and serialized back to "\u000c"
  • 0x0e - 0x1f are parsed and serialized back to 0x0e - 0x1f

These are expected to be parsed and return "\u00xx":

  • "\u0001" - "\u0007" are parsed and serialized back to 0x01 - 0x07
  • "\u000b" is parsed and serialized back to 0x000b
  • "\u000e" - "\u001f" are parsed and serialized back to 0x0e - 0x1f

Note that since there is also Section 9 "Parser", json.h is technically still adhering to the specification. So feel free to decide on the correct way to handle this.

A JSON parser MAY accept non-JSON forms or extensions.

j-moeller avatar Jul 04 '24 00:07 j-moeller

Nice summary thanks! I think we'll fix this - seems worthwhile to err on the side of caution here.

I can take this change up if you wish, but happy to accept a PR if you'd rather do the coding!

sheredom avatar Jul 04 '24 03:07 sheredom

Hi, sorry for the late reply. Unfortunately, I am not that familiar with the code base, so I think it would be better, if you implemented the necessary changes.

j-moeller avatar Jul 12 '24 11:07 j-moeller

Totally fine. When I get the time I'll look into it.

sheredom avatar Jul 12 '24 20:07 sheredom