hhvm icon indicating copy to clipboard operation
hhvm copied to clipboard

[ Runtime | Extension ] json_decode parses invalid JSON strings and returns a result for them

Open lexidor opened this issue 4 years ago • 1 comments

Describe the bug When given an invalid JSON string which is defective in one of the manners described below, json_decode() returns a result. This JSON is not strictly valid according to the spec and should ideally result in a parse error.

Standalone code, or other way to reproduce the problem

function json_decode_adapted(string $json, bool $assoc): (string, ?string) {
  $error = null;
  $return = json_decode_with_error(
    $json,
    inout $error,
    $assoc,
    512,
    \JSON_FB_HACK_ARRAYS,
  );
  $err_str = $error is null ? null : $error[1];
  return tuple(json_encode($return), $err_str);
}

function json_decode_both(string $name, string $json): void {
  list($std_res, $std_err) = json_decode_adapted($json, false);
  list($assoc_res, $assoc_err) = json_decode_adapted($json, true);
  printf("======= %s =======\n", $name);

  if ($std_err is string && $assoc_err is string) {
    printf(
      "Both stdClass and assoc mode rejected this, stdClass: %s, assoc: %s\n\n",
      $std_err,
      $assoc_err,
    );
    return;
  }

  if ($std_err !== $assoc_err) {
    printf(
      "stdClass mode and assoc mode disagree about the error\nstdClass: %s\nassoc: %s\n\n",
      $std_err ?? '<no error>',
      $assoc_err ?? '<no error>',
    );
  }

  if ($std_res !== $assoc_res) {
    printf(
      "The results differ\nstdClass: %s\nassoc: %s\n\n",
      $std_res,
      $assoc_res,
    );
  } else {
    printf("The result %s\n\n", $assoc_res);
  }
}

const dict<string, string> EXAMPLES = dict[
  'single minus token' => '-',
  'minus token in array' => '[-]',
  'invalid numeric literal' => '01',
  'negative invalid numeric literal' => '-01',
  'invalid numeric literal in array' => '[01]',
  'negative invalid numeric literal in array' => '[-01]',
  'form feed as whitespace in array' => "[1\f,2]",
  'form feed as whitespace in object' => <<<END
\f{\f"bad"\f:\f"result"\f}\f
END
];

<<__EntryPoint>>
function main(): void {
  foreach (EXAMPLES as $name => $json) {
    json_decode_both($name, $json);
  }
}

Steps to reproduce the behavior:

  1. Run the script above
  2. The correct result would be Both stdClass and assoc mode rejected this... for every decode attempt
  3. Observe the results (see Actual behavior)

Expected behavior

All these json snippets are syntactically invalid, but not all of them are rejected. These examples should all be a Syntax error or a Control character error.

Actual behavior

======= single minus token =======
stdClass mode and assoc mode disagree about the error
stdClass: Syntax error
assoc: <no error>

The results differ
stdClass: null
assoc: 0

======= minus token in array =======
stdClass mode and assoc mode disagree about the error
stdClass: Syntax error
assoc: <no error>

The results differ
stdClass: null
assoc: [0]

======= invalid numeric literal =======
The result 1

======= negative invalid numeric literal =======
The result -1

======= invalid numeric literal in array =======
Both stdClass and assoc mode rejected this, stdClass: Syntax error, assoc: Syntax error

======= negative invalid numeric literal in array =======
stdClass mode and assoc mode disagree about the error
stdClass: Syntax error
assoc: <no error>

The results differ
stdClass: null
assoc: [-1]

======= form feed as whitespace in array =======
stdClass mode and assoc mode disagree about the error
stdClass: Control character error, possibly incorrectly encoded
assoc: <no error>

The results differ
stdClass: null
assoc: [1,2]

======= form feed as whitespace in object =======
stdClass mode and assoc mode disagree about the error
stdClass: Control character error, possibly incorrectly encoded
assoc: <no error>

The results differ
stdClass: null
assoc: {"bad":"result"}

Environment

  • Operating system

Ubuntu 20.04

  • Installation method

apt-get with dl.hhvm.com repository (also tested in Docker)

  • HHVM Version
HipHop VM 4.88.0 (rel)
Compiler: 1608056886_380600296
Repo schema: d1ae8e21bf3419a65f12a010527485564e719d07
hackc-9e3bb09ecef0a93ae0917fb44ed4e2434e990c17-4.88.0

Additional context The list of examples is not meant to be exhaustive.

lexidor avatar Jan 19 '21 01:01 lexidor

The following commits have improved SimpleParser to make it more spec compliant:

  • https://github.com/facebook/hhvm/commit/bda0d7a6ff94473d923390cf15f8f600eac71acb
  • https://github.com/facebook/hhvm/commit/b9c31ba845fb766be7bc1778eb3a51c47fae0b84
  • https://github.com/facebook/hhvm/commit/d94ac376d925a0b6a8e0e27a1fa8b8e9b08af9b2
  • https://github.com/facebook/hhvm/commit/4eb526474a17dd06c8e3386f6d148c27e4a68ab0
  • https://github.com/facebook/hhvm/commit/d45c940a9aec162a81d2c812eb644148fa97a758

This changes the result of the test cases:

HipHop VM 4.139.0 (rel) (non-lowptr)
Compiler: 1639070357_975296023
Repo schema: 95cede5f0ee10b1712ae18644e4fcfd0f4bd9560

======= single minus token =======
stdClass mode and assoc mode disagree about the error
stdClass: Syntax error
assoc: <no error>

The results differ
stdClass: null
assoc: 0

======= minus token in array =======
stdClass mode and assoc mode disagree about the error
stdClass: Syntax error
assoc: <no error>

The results differ
stdClass: null
assoc: [0]

======= invalid numeric literal =======
The result 1

======= negative invalid numeric literal =======
The result -1

======= invalid numeric literal in array =======
Both stdClass and assoc mode rejected this, stdClass: Syntax error, assoc: Syntax error

======= negative invalid numeric literal in array =======
Both stdClass and assoc mode rejected this, stdClass: Syntax error, assoc: Syntax error

======= form feed as whitespace in array =======
Both stdClass and assoc mode rejected this, stdClass: Control character error, possibly incorrectly encoded, assoc: Control character error, possibly incorrectly encoded

======= form feed as whitespace in object =======
Both stdClass and assoc mode rejected this, stdClass: Control character error, possibly incorrectly encoded, assoc: Control character error, possibly incorrectly encoded

The test cases single minus token, minus token in array are still conditionally correct based on the assoc mode. invalid numeric literal and negative invalid numeric literal are still always incorrect.

lexidor avatar Dec 09 '21 19:12 lexidor

Note, this is the only json parser left. The optionally compiled in json-c parser has been removed in this commit. ef919bb81cc2326a24d07bb9b7b7574eced09366

lexidor avatar Jun 13 '23 06:06 lexidor