ultrajson icon indicating copy to clipboard operation
ultrajson copied to clipboard

UltraJSON accepts invalid json

Open Zerogoki00 opened this issue 2 years ago • 3 comments

What did you do?

Tried to loads this invalid json {"a": -}'

What did you expect to happen?

I expected some exception to be raised

What actually happened?

I got {'a': 0} result

What versions are you using?

  • OS: Arch Linux
  • Python: Python 3.10.5
  • UltraJSON: ujson 5.3.0

Please include code that reproduces the issue.

The best reproductions are self-contained scripts with minimal dependencies.

Python 3.10.5 (main, Jun  6 2022, 18:49:26) [GCC 12.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> import ujson
>>> json.loads('{"a": -}')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 7 (char 6)
>>> ujson.loads('{"a": -}')
{'a': 0}

According to JSON's RFC https://datatracker.ietf.org/doc/html/rfc7159#section-6 standalone minus isn't a valid number. Builtin python json module doesn't accept this json and raises exception. But ujson doesn't raise any exception and parses minus sign as 0. I think it's a bug

Zerogoki00 avatar Jun 24 '22 15:06 Zerogoki00

Yeah, this was known internally for a while. The negative sign triggers the number detection code, but it never verifies that it has seen at least one digit. The same kind of behaviour also exists with json.loads('-.1') or '-.1e1', by the way.

Note that, while all of these are obviously invalid JSON, the spec allows parsers to accept non-JSON inputs (in section 9). So it's not a violation of the spec.

Since this doesn't have any effect on parsing valid JSON, can't be abused for nefarious purposes as far as I can tell, and would require a special check that would hardly ever get triggered and would therefore usually only negatively impact performance of the number parsing code, I think this is a quirk that we can live with. Although I'm curious how an UNLIKELY(...) check would perform.

JustAnotherArchivist avatar Jun 24 '22 18:06 JustAnotherArchivist

Briefly tested this:

--- a/lib/ultrajsondec.c
+++ b/lib/ultrajsondec.c
@@ -114,6 +114,10 @@ static FASTCALL_ATTR JSOBJ FASTCALL_MSVC decode_numeric (struct DecoderState *ds
     {
       goto DECODE_INF;
     }
+    if (UNLIKELY(*(offset) < '0' || *(offset) > '9'))
+    {
+      return SetError(ds, -1, "Unexpected character found when decoding number");
+    }
     maxIntValue = -(JSUINT64) LLONG_MIN;
     overflowLimit = maxIntValue / 10LL;
   }

When parsing an array consisting of ten thousand -1s, probably one of the worst cases as that additional check needs to run on every third character, this increases the execution time by about 1.6 % for me, from 186 to 189 μs (python3 -m timeit -s 'import ujson; o = "[" + "-1,"*9999 + "-1]"; l = ujson.loads' 'l(o)').

JustAnotherArchivist avatar Jun 24 '22 22:06 JustAnotherArchivist

I think I'd still prefer to leave it as it is now since, whilst the cost of fixing it is small, the cost of not fixing is zero as far as I can tell. But it's a tiny thing either way. I'm not fussy if someone else feels more strongly about it.

bwoodsend avatar Jun 25 '22 15:06 bwoodsend