cpython icon indicating copy to clipboard operation
cpython copied to clipboard

Proper or custom JSON serialization of non-finite float values

Open Dzeri96 opened this issue 1 year ago • 2 comments

Feature or enhancement

The goal of this feature is to allow the JSON serializer in stdlib to serialize non-finite values (NaN, Inf, -Inf) according to the JSON specification. Going beyond just conforming to the spec, we could allow for custom serialization behavior.

Previous discussion

This problem was previously discussed in #84813, and a related PR was submitted in 2019, however, @mdickinson suggested I open a fresh issue where we can discuss the implementation in-depth.

Pitch

Currently, python's default JSON serializer encodes values like NaN as-is, with the explanation being that many JS-based JSON libraries also do this, and that the corresponding parsers can handle such non-conforming input. In reality, most major browsers do not support this type of encoding and even NodeJS(v14.16.0) acts according to the previously-linked JSON spec. The keyword argument allow_nan makes the serializer throw when encountering non-finite values when set to true, but I'd argue it is paramount to ensure compatibility with the spec and modern browsers. Changing the default behavior is of course not needed or possible at this point.

When implementing this feature, there are two main decisions to make.

Firstly, it has to be decided if allow_nan should be extended to take more datatypes like strings and callables, or if we should create a separate argument for this functionality. Re-purposing allow_nan would make the control over such behavior centralized, however the name is very limiting. It doesn't say anything about other non-finite values, and without looking at the docs, one would think it only takes bool values.

Secondly, it has to be decided how far we want to take this feature. Do we want to have pre-defined cases like as_is, throw, and to_null, or do we want to allow the user to pass their own callable? The latter is implemented by the linked PR. Having both options is also a possibility.

Overall, each combination of decisions has its advantages and drawbacks. Since I wasn't a part of such discussions before, I don't have a preference. All I want is to see this feature get implemented, and I can create a PR once consensus is reached.

Dzeri96 avatar Oct 15 '22 21:10 Dzeri96

@Dzeri96 Thanks for opening the issue!

There was a fair bit of confusion in #84813; it would be good to eliminate that confusion up front here. To clarify, when you say:

to serialize non-finite values (NaN, Inf, -Inf) according to the JSON specification

An option for this already exists, via allow_nan = False. I think you (and others) were asking for something different, namely an option to convert NaNs and Infinities to null. Is that correct?

mdickinson avatar Oct 16 '22 07:10 mdickinson

An option for this already exists, via allow_nan = False.

After taking a second look at the spec I guess throwing an exception instead of producing any output could be considered conform. In any case, yes, I'm suggesting we add options for conversion to null or even custom behavior as explained above.

Dzeri96 avatar Oct 16 '22 09:10 Dzeri96

I know the last week has probably been very hectic with the release of 3.11, but I'd hate this issue to be buried like the MR from 2019. Any tips of how to promote discussion and push this to get reviewed? @mdickinson

Dzeri96 avatar Oct 26 '22 11:10 Dzeri96

@Dzeri96 Maybe a ping on https://discuss.python.org? I'm afraid that I'm not personally likely to have time to look at this this side of Christmas.

mdickinson avatar Oct 26 '22 18:10 mdickinson

A thread has been created here. Let's close this issue and continue discussion there.

Dzeri96 avatar Oct 28 '22 09:10 Dzeri96