ultrajson UltraJSON accepts invalid json

What did you do?

Tried to loads this invalid json {"a": -}'

What did you expect to happen?

I expected some exception to be raised

What actually happened?

I got {'a': 0} result

What versions are you using?

OS: Arch Linux
Python: Python 3.10.5
UltraJSON: ujson 5.3.0

Please include code that reproduces the issue.

The best reproductions are self-contained scripts with minimal dependencies.

Python 3.10.5 (main, Jun  6 2022, 18:49:26) [GCC 12.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> import ujson
>>> json.loads('{"a": -}')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 7 (char 6)
>>> ujson.loads('{"a": -}')
{'a': 0}

According to JSON's RFC https://datatracker.ietf.org/doc/html/rfc7159#section-6 standalone minus isn't a valid number. Builtin python json module doesn't accept this json and raises exception. But ujson doesn't raise any exception and parses minus sign as 0. I think it's a bug

Jun 24 '22 15:06 Zerogoki00

Yeah, this was known internally for a while. The negative sign triggers the number detection code, but it never verifies that it has seen at least one digit. The same kind of behaviour also exists with json.loads('-.1') or '-.1e1', by the way.

Note that, while all of these are obviously invalid JSON, the spec allows parsers to accept non-JSON inputs (in section 9). So it's not a violation of the spec.

Since this doesn't have any effect on parsing valid JSON, can't be abused for nefarious purposes as far as I can tell, and would require a special check that would hardly ever get triggered and would therefore usually only negatively impact performance of the number parsing code, I think this is a quirk that we can live with. Although I'm curious how an UNLIKELY(...) check would perform.

Jun 24 '22 18:06 JustAnotherArchivist

Briefly tested this:

--- a/lib/ultrajsondec.c
+++ b/lib/ultrajsondec.c
@@ -114,6 +114,10 @@ static FASTCALL_ATTR JSOBJ FASTCALL_MSVC decode_numeric (struct DecoderState *ds
     {
       goto DECODE_INF;
     }
+    if (UNLIKELY(*(offset) < '0' || *(offset) > '9'))
+    {
+      return SetError(ds, -1, "Unexpected character found when decoding number");
+    }
     maxIntValue = -(JSUINT64) LLONG_MIN;
     overflowLimit = maxIntValue / 10LL;
   }

When parsing an array consisting of ten thousand -1s, probably one of the worst cases as that additional check needs to run on every third character, this increases the execution time by about 1.6 % for me, from 186 to 189 μs (python3 -m timeit -s 'import ujson; o = "[" + "-1,"*9999 + "-1]"; l = ujson.loads' 'l(o)').

Jun 24 '22 22:06 JustAnotherArchivist

I think I'd still prefer to leave it as it is now since, whilst the cost of fixing it is small, the cost of not fixing is zero as far as I can tell. But it's a tiny thing either way. I'm not fussy if someone else feels more strongly about it.

Jun 25 '22 15:06 bwoodsend

ultrajson ultrajson copied to clipboard

UltraJSON accepts invalid json

What did you do?

What did you expect to happen?

What actually happened?

What versions are you using?

ultrajson
ultrajson copied to clipboard