json icon indicating copy to clipboard operation
json copied to clipboard

Handling of infinity and NaN

Open vinniefalco opened this issue 3 years ago • 7 comments

Specifically, if the user puts an inf or NaN into a value, what happens on serialization? We need to specify what should happen on serialization. And if we output "inf" and "nan" then we need to specify what happens on parsing. And we might need to provide a configuration option for allowing it / disallowing it (on parse).

vinniefalco avatar Sep 22 '20 17:09 vinniefalco

I have also bumped into this while testing. NaN is serialized as such but then it actually throws as it's not recognized by the parser.

accelerated avatar Dec 16 '21 22:12 accelerated

Currently it produces invalid json (https://gcc.godbolt.org/z/31ebz6b9q).

I personally serialize to reuse it in javascript (and I imagine that would be quite common) so just putting in a string would work out alright for that. "Infinity" * 42 == Infinity, because javascript.

Alternatively, 1e800 is Infinity (-1e800) in JS, plus NaN could be null. I don't know how well that works for other languages though.

klemens-morgenstern avatar Feb 04 '22 14:02 klemens-morgenstern

@pdimov suggested using a struct like this:

struct serialize_options {
  string_view pinf;
  string_view ninf;
  string_view nan;
};

to control serializer's behaviour.

Usage would be something like

os << serialize(jv, { .pinf = "null", .ninf = "null", .nan = "null" });

If a string_views is empty and the corresponding special value is encountered the serializer would error-out.

@vinniefalco has added that sometimes users would want to know if a special value has been encountered even when they do provide a string to serialize it as. So, maybe serializer should always produce an error_code when a special value is encountered.

Last thing to consider: what should be the default behaviour, i.e. when serialize_options is not used?

grisumbras avatar Feb 04 '22 16:02 grisumbras

The default should be the same as passing {}, that is, throw/error.

This may not technically be the most correct default, but it's the most intuitive given the interface. The rule is, if you provide a representation for a special value, this representation is used, otherwise, there's an error on encountering it.

No, I don't think the serializer should error out even when a representation is supplied.

pdimov avatar Feb 04 '22 20:02 pdimov

I agree with Klemens here. JSON is closely linked with the value models of JavaScript and to a lesser extent, python.

If you’re going to handle non-numbers at all, the treatment should model that of JavaScript, otherwise interoperability becomes a headache - you’ll have to remember two rule sets and code up the conversions between the two.

On Fri, 4 Feb 2022 at 15:42, Klemens Morgenstern @.***> wrote:

Currently it produces invalid json (https://gcc.godbolt.org/z/31ebz6b9q).

I personally serialize to reuse it in javascript (and I imagine that would be quite common) so just putting in a string would work out alright for that. "Infinity" * 42 == Infinity, because javascript.

— Reply to this email directly, view it on GitHub https://github.com/boostorg/json/issues/397#issuecomment-1030051073, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHOZSNBCZU42D5W5HDAJL3UZPQXFANCNFSM4RWBEF5A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Richard Hodges @.*** office: +44 2032 898 513 home: +376 861 195 mobile: +376 380 212

madmongo1 avatar Feb 04 '22 20:02 madmongo1

and to a lesser extent, python.

That's exactly what the current behavior is.

pdimov avatar Feb 04 '22 20:02 pdimov

@vinniefalco has added that sometimes users would want to know if a special value has been encountered even when they do provide a string to serialize it as. So, maybe serializer should always produce an error_code when a special value is encountered.

But this is not the same thing. A non-successful error code is non-recoverable. You can't "always produce an error code when a special value is encountered." If we are going to aggregate statistics on the number of non-compliant numbers encountered, they need to be communicated out of band. For example by calling a separate function which returns the statistics, when serialization is complete. Since this can always be added later as a feature if/when needed, without affecting the design of the original topic of this issue, we don't have to worry about it.

vinniefalco avatar Feb 04 '22 21:02 vinniefalco

what about deserialize?

x10000year avatar Nov 14 '22 07:11 x10000year

You mean parse? JSON spec doesn't support infinities and NaNs. If it did, this issue wouldn't have existed.

grisumbras avatar Nov 14 '22 19:11 grisumbras

An approach that is very simple to implement that will allow us to support infinity (but not NaN) is to serialize infinity as 1e999. Many implementations (including ours) parse that as infinity.

grisumbras avatar May 04 '23 13:05 grisumbras

1e999 is a valid long double, so we should avoid it and use, e.g. 1e99999.

pdimov avatar May 04 '23 14:05 pdimov

To repeat what was said by @pdimov on Slack, these are the options:

  • Infinity, -Infinity, NaN
  • 1e99999, -1e99999, null
  • 1e99999, -1e99999, throw
  • throw, throw, throw

We probably don't want to throw here. This leaves first two options. I personally prefer option 2 ( 1e99999, -1e99999, null), because it's valid JSON. Option 1 has the benefit of not changing source values to something different, and this non-standard syntax is supported by several popular implementations (Python and RapidJSON with an option).

So, how about option 2 by default and option 1 enabled explicitly?

grisumbras avatar May 04 '23 14:05 grisumbras

Fine with me in principle. How would this look like in the API?

pdimov avatar May 04 '23 14:05 pdimov

Something like

auto s = serialize( jv, {.explicit_special_numbers=true} );
auto jv = parse( s, {.explicit_special_numbers=true} );

grisumbras avatar May 04 '23 14:05 grisumbras

The symmetry is appealing; maybe this is better than my suggestion above.

pdimov avatar May 04 '23 18:05 pdimov