hhvm
hhvm copied to clipboard
[ Runtime | Extension ] json_decode parses invalid JSON strings and returns a result for them
Describe the bug
When given an invalid JSON string which is defective in one of the manners described below, json_decode()
returns a result. This JSON is not strictly valid according to the spec and should ideally result in a parse error.
Standalone code, or other way to reproduce the problem
function json_decode_adapted(string $json, bool $assoc): (string, ?string) {
$error = null;
$return = json_decode_with_error(
$json,
inout $error,
$assoc,
512,
\JSON_FB_HACK_ARRAYS,
);
$err_str = $error is null ? null : $error[1];
return tuple(json_encode($return), $err_str);
}
function json_decode_both(string $name, string $json): void {
list($std_res, $std_err) = json_decode_adapted($json, false);
list($assoc_res, $assoc_err) = json_decode_adapted($json, true);
printf("======= %s =======\n", $name);
if ($std_err is string && $assoc_err is string) {
printf(
"Both stdClass and assoc mode rejected this, stdClass: %s, assoc: %s\n\n",
$std_err,
$assoc_err,
);
return;
}
if ($std_err !== $assoc_err) {
printf(
"stdClass mode and assoc mode disagree about the error\nstdClass: %s\nassoc: %s\n\n",
$std_err ?? '<no error>',
$assoc_err ?? '<no error>',
);
}
if ($std_res !== $assoc_res) {
printf(
"The results differ\nstdClass: %s\nassoc: %s\n\n",
$std_res,
$assoc_res,
);
} else {
printf("The result %s\n\n", $assoc_res);
}
}
const dict<string, string> EXAMPLES = dict[
'single minus token' => '-',
'minus token in array' => '[-]',
'invalid numeric literal' => '01',
'negative invalid numeric literal' => '-01',
'invalid numeric literal in array' => '[01]',
'negative invalid numeric literal in array' => '[-01]',
'form feed as whitespace in array' => "[1\f,2]",
'form feed as whitespace in object' => <<<END
\f{\f"bad"\f:\f"result"\f}\f
END
];
<<__EntryPoint>>
function main(): void {
foreach (EXAMPLES as $name => $json) {
json_decode_both($name, $json);
}
}
Steps to reproduce the behavior:
- Run the script above
- The correct result would be
Both stdClass and assoc mode rejected this...
for every decode attempt - Observe the results (see Actual behavior)
Expected behavior
All these json snippets are syntactically invalid, but not all of them are rejected. These examples should all be a Syntax error
or a Control character error
.
Actual behavior
======= single minus token =======
stdClass mode and assoc mode disagree about the error
stdClass: Syntax error
assoc: <no error>
The results differ
stdClass: null
assoc: 0
======= minus token in array =======
stdClass mode and assoc mode disagree about the error
stdClass: Syntax error
assoc: <no error>
The results differ
stdClass: null
assoc: [0]
======= invalid numeric literal =======
The result 1
======= negative invalid numeric literal =======
The result -1
======= invalid numeric literal in array =======
Both stdClass and assoc mode rejected this, stdClass: Syntax error, assoc: Syntax error
======= negative invalid numeric literal in array =======
stdClass mode and assoc mode disagree about the error
stdClass: Syntax error
assoc: <no error>
The results differ
stdClass: null
assoc: [-1]
======= form feed as whitespace in array =======
stdClass mode and assoc mode disagree about the error
stdClass: Control character error, possibly incorrectly encoded
assoc: <no error>
The results differ
stdClass: null
assoc: [1,2]
======= form feed as whitespace in object =======
stdClass mode and assoc mode disagree about the error
stdClass: Control character error, possibly incorrectly encoded
assoc: <no error>
The results differ
stdClass: null
assoc: {"bad":"result"}
Environment
- Operating system
Ubuntu 20.04
- Installation method
apt-get with dl.hhvm.com repository (also tested in Docker)
- HHVM Version
HipHop VM 4.88.0 (rel)
Compiler: 1608056886_380600296
Repo schema: d1ae8e21bf3419a65f12a010527485564e719d07
hackc-9e3bb09ecef0a93ae0917fb44ed4e2434e990c17-4.88.0
Additional context The list of examples is not meant to be exhaustive.
The following commits have improved SimpleParser to make it more spec compliant:
- https://github.com/facebook/hhvm/commit/bda0d7a6ff94473d923390cf15f8f600eac71acb
- https://github.com/facebook/hhvm/commit/b9c31ba845fb766be7bc1778eb3a51c47fae0b84
- https://github.com/facebook/hhvm/commit/d94ac376d925a0b6a8e0e27a1fa8b8e9b08af9b2
- https://github.com/facebook/hhvm/commit/4eb526474a17dd06c8e3386f6d148c27e4a68ab0
- https://github.com/facebook/hhvm/commit/d45c940a9aec162a81d2c812eb644148fa97a758
This changes the result of the test cases:
HipHop VM 4.139.0 (rel) (non-lowptr)
Compiler: 1639070357_975296023
Repo schema: 95cede5f0ee10b1712ae18644e4fcfd0f4bd9560
======= single minus token =======
stdClass mode and assoc mode disagree about the error
stdClass: Syntax error
assoc: <no error>
The results differ
stdClass: null
assoc: 0
======= minus token in array =======
stdClass mode and assoc mode disagree about the error
stdClass: Syntax error
assoc: <no error>
The results differ
stdClass: null
assoc: [0]
======= invalid numeric literal =======
The result 1
======= negative invalid numeric literal =======
The result -1
======= invalid numeric literal in array =======
Both stdClass and assoc mode rejected this, stdClass: Syntax error, assoc: Syntax error
======= negative invalid numeric literal in array =======
Both stdClass and assoc mode rejected this, stdClass: Syntax error, assoc: Syntax error
======= form feed as whitespace in array =======
Both stdClass and assoc mode rejected this, stdClass: Control character error, possibly incorrectly encoded, assoc: Control character error, possibly incorrectly encoded
======= form feed as whitespace in object =======
Both stdClass and assoc mode rejected this, stdClass: Control character error, possibly incorrectly encoded, assoc: Control character error, possibly incorrectly encoded
The test cases single minus token
, minus token in array
are still conditionally correct based on the assoc mode. invalid numeric literal
and negative invalid numeric literal
are still always incorrect.
Note, this is the only json parser left. The optionally compiled in json-c parser has been removed in this commit. ef919bb81cc2326a24d07bb9b7b7574eced09366