ultrajson icon indicating copy to clipboard operation
ultrajson copied to clipboard

Add ability to serialise Decimal as string

Open ameyarao98 opened this issue 1 year ago • 15 comments

Currently Decimal serialises to floats by default, which is undesirable when working with money for instance. The default option cannot be used for this since it does not raise an error, so a clean way to be able to specify that Decimal should be serialised as a string in dumps would be very helpful

ameyarao98 avatar Jul 15 '23 12:07 ameyarao98

To be clear, the expected behaviour is:

>>> ujson.dumps(decimal.Decimal("1.123123123123123123123123123"))
'"1.123123123123123123123123123"'

i.e. the output is a JSON string and there's no rounding?

bwoodsend avatar Jul 15 '23 12:07 bwoodsend

And setting the default parameter to something that calls str() if given a Decimal() doesn't solve that problem because the default is only called if ujson can't handle the value on its own and ujson thinks that converting Decimal() instances to floats is intended handling.

bwoodsend avatar Jul 15 '23 12:07 bwoodsend

Current behaviour:

>>> ujson.dumps({"amount": Decimal("15.35")})
'{"amount":15.35}'

Expected behaviour:

>>> ujson.dumps({"amount": Decimal("15.35")}, decimal_as_string=True)
'{"amount":"15.35"}'

ameyarao98 avatar Jul 15 '23 13:07 ameyarao98

I would argue that the dumping behaviour is correct. Numbers in JSON are not floats but arbitrarily long sequences of digits.

For parsing, we should support parse_float = decimal.Decimal as is mentioned in the stdlib json documentation. There is already an open issue for that: #401

JustAnotherArchivist avatar Jul 15 '23 16:07 JustAnotherArchivist

Honestly, I'd rather delete the current handling of Decimal() and force people to either normalize their data into standard objects before passing it to ujson.dumps() or set default to a function using either str() or float(). This is the same issue as serialising datetime()s – it's ambiguous and shouldn't be the job of any JSON library to guess appropriate approximations of incompatible types.

bwoodsend avatar Jul 15 '23 16:07 bwoodsend

I disagree. decimal.Decimal can be directly mapped to a JSON number. It's just a -?\d+(\.\d+)?(E[+-]\d+)? number in almost all cases. There are two special cases, NaN and Infinity (positive or negative), which we already support for floats anyway as an unambiguous extension to JSON. Positive and negative zero are not a problem either. So encoding to JSON is not ambiguous. Parsing is a different story, but we rightly don't do anything there.

The current implementation is definitely not right though since it converts to a float first. This results in rounding and loss of number of significant figures.

JustAnotherArchivist avatar Jul 15 '23 17:07 JustAnotherArchivist

Some examples of what I would expect:

>>> ujson.dumps(decimal.Decimal('1.230'))
'1.230'
>>> ujson.dumps(decimal.Decimal('1.23000000000000001'))
'1.23000000000000001'
>>> ujson.dumps(decimal.Decimal('0.123456789012345678901234567890'))
'0.123456789012345678901234567890'
>>> ujson.dumps(decimal.Decimal('-0'))
'-0'
>>> ujson.dumps(decimal.Decimal('-0.00'))
'-0.00'
>>> ujson.dumps(decimal.Decimal('1e1'))
'1E+1'  # or equivalent; this is the Python 3.10 decimal normalisation behaviour

JustAnotherArchivist avatar Jul 15 '23 17:07 JustAnotherArchivist

I guess that if we went down the route of decimals written exactly as JSON numbers then we'd have to implement #401 or it's all a bit of a pointless exercise.

Sod it, I don't mind which route we choose.

bwoodsend avatar Jul 15 '23 17:07 bwoodsend

It wouldn't be entirely pointless, but yeah, I agree we should implement that anyway (if only for compatibility with stdlib).

I looked a bit into what would be needed for this. str(decimal.Decimal(...)) isn't documented really. I think it should always produce a valid JSON number, and simplejson uses it, so it's probably safe enough, but there's no documentation of the exact format. The relevant code for it is this, but I haven't read through it yet. The alternative would be to use decimal.Decimal.as_tuple and then format it as a JSON number ourselves, but I'd rather not.

JustAnotherArchivist avatar Jul 15 '23 18:07 JustAnotherArchivist

If you're worried about edge cases like infinity and negative zero, then as long as we have tests for them, I don't see any harm relying on str(decimal.Decimal(...)).

bwoodsend avatar Jul 15 '23 19:07 bwoodsend

Those edge cases are behaving. I can't think of anything that would produce weird output. I guess it just annoys me that it isn't documented, which means it shouldn't be relied on. Yeah, I'd add plenty of tests.

JustAnotherArchivist avatar Jul 15 '23 21:07 JustAnotherArchivist

I do find it weird that str(decimal.Decimal(math.inf)) != str(math.inf)

bwoodsend avatar Jul 15 '23 21:07 bwoodsend

Fortunately, the former produces the output we want, and same for math.nan. :-)

JustAnotherArchivist avatar Jul 15 '23 21:07 JustAnotherArchivist

I read through the code I linked above. There's one special case: signalling NaNs would become sNaN. I'm not sure you can produce an actual signalling NaN in Python, but you certainly can do decimal.Decimal('sNaN'). Everything else should result in a valid JSON number (or NaN/Infinity/-Infinity extension). The context is only used for deciding whether or not the exponent separator E should be capitalised, but both variants are valid JSON, so we can ignore that.

JustAnotherArchivist avatar Jul 15 '23 23:07 JustAnotherArchivist

I would argue that the dumping behaviour is correct. Numbers in JSON are not floats but arbitrarily long sequences of digits.

The claim "Numbers in JSON are not floats but arbitrarily long sequences of digits." is a bit dubious. For example RFC 8259 states that implementations should, for maximum interoperability, assume that JSON numbers can get rounded to 64-bit floats or something even less precise and should not treat them as arbitrary precision numbers (see https://datatracker.ietf.org/doc/html/rfc8259#section-6).

FeldrinH avatar Jan 24 '24 21:01 FeldrinH