stream-json icon indicating copy to clipboard operation
stream-json copied to clipboard

String literals with control characters, newlines, tabs should not parse

Open rictic opened this issue 1 year ago • 3 comments

I ran this library through https://github.com/nst/JSONTestSuite and it handled things well, matching the spec and the behavior of JSON.parse for all of the tests, save for these three which are all testing essentially the same thing:

n_string_unescaped_ctrl_char.json n_string_unescaped_newline.json n_string_unescaped_tab.json

rictic avatar Oct 09 '24 01:10 rictic

It's entirely possible that this is an issue in my test harness, this is how I'm calling stream-json:

const { chain } = require("stream-chain");
const { parser } = require("stream-json");
const nodeStream = require("stream");
async function emulateJsonParse(str: string) {
  const stream = new nodeStream.Readable();
  stream.push(str);
  stream.push(null);
  const tokens = chain([stream, parser()]);
  const parsedValue = await convertTokensToValue(tokens);
  return parsedValue;
}

rictic avatar Oct 09 '24 01:10 rictic

The parser implements JSON according to the standard as it is defined here: https://www.json.org/json-en.html — the definition is in the right side panel. This decision is documented in the wiki. According to this document characters start from 0x20. Unescaped characters (0x00-0x1F) are not allowed.

Am I missing something?

I am aware that there are gazillion JSON standards. Some of them provide different twists on the plain vanilla JSON. Some of them venture way outside. One of them (JSONL) is supported by this package. I was asked about JSONC and JSON5, but at the moment I have no definitive plans on that and, frankly, no real user request for that.

Having said that I can be persuaded to align with a different version of the JSON standard provided there are good reasons for that. ;-)

uhop avatar Oct 09 '24 04:10 uhop

Agreed on all of those philosophical points. However, I think these three tests are correct in testing the spec as defined by json.org.

They check that a string literal with an unescaped null, an unescaped newline, and an unescaped tab should not parse. They're charcodes 0x00, 0x0A, and 0x09, which are all less than 0x20 and so not legal characters.

Compare with the native JS JSON.parse, all three of these expressions throw:

JSON.parse(`["a\x00a"]`)
JSON.parse(`["a\na"]`)
JSON.parse(`["a\ta"]`)

When I tried out stream-json, it matched JSON.parse's behavior for all of the tests in JSONTestSuite, save for these three.

rictic avatar Oct 09 '24 21:10 rictic

Published as 1.9.0. Please verify that it works for you.

uhop avatar Oct 23 '24 05:10 uhop