Input mapping is buggy
This package commits the cardinal sin of escaping: failing to fully exempt non-escaped input from unescaping (e.g., escapeKey escapes each . as \. but fails to escape \ itself). As a result, there are some injection bugs: https://jsfiddle.net/euqLwk4r/
input
({
a: [ "/'a'[0]: string that becomes a regex/" ],
'a.0': /'a.0': regex that becomes a string/,
'b.0': "/'b.0': string that becomes a regex/",
'b\\': [ /'b\\'[0]: regex that becomes a string/ ],
})
encodes to
{
"json": {
"a": [ "/'a'[0]: string that becomes a regex/" ],
"a.0": "/'a.0': regex that becomes a string/",
"b.0": "/'b.0': string that becomes a regex/",
"b\\": [ "/'b\\\\'[0]: regex that becomes a string/" ]
},
"meta": {
"values": {
"a.0": [ "regexp" ],
"b\\.0": [ "regexp" ]
}
}
}
which decodes as the input-dissimilar
({
a: [ /'a'[0]: string that becomes a regex/ ],
'a.0': "/'a.0': regex that becomes a string/",
'b.0': /'b.0': string that becomes a regex/,
'b\\': [ "/'b\\\\'[0]: regex that becomes a string/" ],
})
Thanks for the report, this makes sense to me. You're the first in quite a long time to report this, so at least it's not a giant impact. Do you see a way for us to fix this without breaking changes?
You're the first in quite a long time to report this, so at least it's not a giant impact.
So basically, no one is using this with data in which a property name contains dots or backslashes and the value is not already JSON-compatible, which seems likely.
Do you see a way for us to fix this without breaking changes?
I think that depends upon what you consider breaking. But in the strictest sense, breakage seems necessary because a meta name of "a.0" MUST be applied to json.a[0] (even though it could have come from input["a.0"]) and a meta name of "b\.0" MUST be applied to json["b.0"] (even though it could have come from input["b\"]).
But if you're trying to preserve backwards compatibility such that unaffected { json, meta } records produced by an old version can still be correctly decoded by a new version that fixes this issue, that would be possible—any meta name containing at least one dot would be subject to an O(n²) search for which dots are part of property names vs. path traversal (for efficiency, presumably starting with the assumption that they all represent path traversal if an interpretation as the new style[^1] fails to find a json node) and any path segment containing consecutive backslashes would be subject to double interpretation (presumably starting with new-style \\ → \). It would also make sense to require that every property in meta always finds a json node, and further support opting into a strict mode that uses only the new interpretation. And no matter what, I strongly recommend fuzzing to find any inputs that fail to round-trip.
[^1]: suggestion for new style:
js const pathSegments = pathString.match( // Match any possibly-empty combination of (non-dot, backslash with // optional escapee) that follows either start-of-string or a // non-escaped dot (i.e., that does not follow an escaped dot). /(?<=^|(?:^|[^\\])(?:\\\\)*[.])(?:[^.\\]|\\.?)*/gs, ).map(segment => segment.replace(/\\(.?)/gs, (_, ch) => { // The only valid escapes are `\.` and `\\`. if (ch === "." || ch === "\\") return ch; throw Error(`invalid path`); }));
Note that this specifically works for path strings that start or end with dots and/or backslashes, that contain invalid escapes (including "…\\\n…" <U+005C BACKSLASH, U+000A LINE FEED>), and that contain consecutive dots (i.e., empty path segments).
Thanks for the recommendation. The second form of backwards compat is what i'm after, I don't want the gazillion of persisted SuperJSON strings to suddenly start failing. I'd also like to keep usage of this library simple and straight-forward, so having an opt-in for something of this bugs level of detail is a no-go. I think we can ensure backwards compatibility by introducing a version tag that toggles old vs new parsing behaviour. https://github.com/flightcontrolhq/superjson/pull/311 implements that - could you give it a look?
Fuzzing sounds like a great idea btw.
Simple case here: dotted keys are not handled correctly
console.log(
transformer.deserialize(transformer.serialize({ "a.b.c": undefined })),
);
{a.b.c: null}