ultrajson
ultrajson copied to clipboard
Add ability to serialise Decimal as string
Currently Decimal serialises to floats by default, which is undesirable when working with money for instance. The default
option cannot be used for this since it does not raise an error, so a clean way to be able to specify that Decimal should be serialised as a string in dumps
would be very helpful
To be clear, the expected behaviour is:
>>> ujson.dumps(decimal.Decimal("1.123123123123123123123123123"))
'"1.123123123123123123123123123"'
i.e. the output is a JSON string and there's no rounding?
And setting the default
parameter to something that calls str()
if given a Decimal()
doesn't solve that problem because the default is only called if ujson
can't handle the value on its own and ujson
thinks that converting Decimal()
instances to floats is intended handling.
Current behaviour:
>>> ujson.dumps({"amount": Decimal("15.35")})
'{"amount":15.35}'
Expected behaviour:
>>> ujson.dumps({"amount": Decimal("15.35")}, decimal_as_string=True)
'{"amount":"15.35"}'
I would argue that the dumping behaviour is correct. Numbers in JSON are not floats but arbitrarily long sequences of digits.
For parsing, we should support parse_float = decimal.Decimal
as is mentioned in the stdlib json
documentation. There is already an open issue for that: #401
Honestly, I'd rather delete the current handling of Decimal()
and force people to either normalize their data into standard objects before passing it to ujson.dumps()
or set default
to a function using either str()
or float()
. This is the same issue as serialising datetime()
s – it's ambiguous and shouldn't be the job of any JSON library to guess appropriate approximations of incompatible types.
I disagree. decimal.Decimal
can be directly mapped to a JSON number. It's just a -?\d+(\.\d+)?(E[+-]\d+)?
number in almost all cases. There are two special cases, NaN
and Infinity
(positive or negative), which we already support for floats anyway as an unambiguous extension to JSON. Positive and negative zero are not a problem either. So encoding to JSON is not ambiguous. Parsing is a different story, but we rightly don't do anything there.
The current implementation is definitely not right though since it converts to a float first. This results in rounding and loss of number of significant figures.
Some examples of what I would expect:
>>> ujson.dumps(decimal.Decimal('1.230'))
'1.230'
>>> ujson.dumps(decimal.Decimal('1.23000000000000001'))
'1.23000000000000001'
>>> ujson.dumps(decimal.Decimal('0.123456789012345678901234567890'))
'0.123456789012345678901234567890'
>>> ujson.dumps(decimal.Decimal('-0'))
'-0'
>>> ujson.dumps(decimal.Decimal('-0.00'))
'-0.00'
>>> ujson.dumps(decimal.Decimal('1e1'))
'1E+1' # or equivalent; this is the Python 3.10 decimal normalisation behaviour
I guess that if we went down the route of decimals written exactly as JSON numbers then we'd have to implement #401 or it's all a bit of a pointless exercise.
Sod it, I don't mind which route we choose.
It wouldn't be entirely pointless, but yeah, I agree we should implement that anyway (if only for compatibility with stdlib).
I looked a bit into what would be needed for this. str(decimal.Decimal(...))
isn't documented really. I think it should always produce a valid JSON number, and simplejson uses it, so it's probably safe enough, but there's no documentation of the exact format. The relevant code for it is this, but I haven't read through it yet. The alternative would be to use decimal.Decimal.as_tuple
and then format it as a JSON number ourselves, but I'd rather not.
If you're worried about edge cases like infinity and negative zero, then as long as we have tests for them, I don't see any harm relying on str(decimal.Decimal(...))
.
Those edge cases are behaving. I can't think of anything that would produce weird output. I guess it just annoys me that it isn't documented, which means it shouldn't be relied on. Yeah, I'd add plenty of tests.
I do find it weird that str(decimal.Decimal(math.inf)) != str(math.inf)
Fortunately, the former produces the output we want, and same for math.nan
. :-)
I read through the code I linked above. There's one special case: signalling NaNs would become sNaN
. I'm not sure you can produce an actual signalling NaN in Python, but you certainly can do decimal.Decimal('sNaN')
. Everything else should result in a valid JSON number (or NaN
/Infinity
/-Infinity
extension). The context is only used for deciding whether or not the exponent separator E
should be capitalised, but both variants are valid JSON, so we can ignore that.
I would argue that the dumping behaviour is correct. Numbers in JSON are not floats but arbitrarily long sequences of digits.
The claim "Numbers in JSON are not floats but arbitrarily long sequences of digits." is a bit dubious. For example RFC 8259 states that implementations should, for maximum interoperability, assume that JSON numbers can get rounded to 64-bit floats or something even less precise and should not treat them as arbitrary precision numbers (see https://datatracker.ietf.org/doc/html/rfc8259#section-6).