json Inconsistent Behaviour of NaN & Null Values

Description

Let us assume we have nan = std::numeric_limits<double>::quiet_NaN() and null = nlohmann::json() in our json object. Then the first is a number and NOT null. Whereas the second is null and NOT a number. However, in string form, both appear as null.

So when dumping everything to a json string and then parsing it back into a json object, then both values have now become null. And neither value is a number. So when we try to convert any of them back to a double we get type must be number, but is null.

Reproduction steps

See Minimal code example or https://godbolt.org/z/b9n7Eq9qv

Expected vs. actual results

Both std::numeric_limits<double>::quiet_NaN() and nlohmann::json() are displayed as null in a json file. So a nlohmann::json object should treat both the same at any given point in time. Two solutions:

Either immediately convert std::numeric_limits<double>::quiet_NaN() to a true null object (which is not a number). So that json in memory is consistent with its json string representation.
Or allow null to get cast to NaN when converting to double.

Since NaN serializes to null, I expect that when I convert a json null object to a double to get NaN. That is, solution 2.

Minimal code example

#include <nlohmann/json.hpp>
#include <iostream>

int main()
{
    using json = nlohmann::json;

    const double NaN     = std::numeric_limits<double>::quiet_NaN();
    json jsonData1       = {1.72, NaN, json()};
    const json jsonData2 = json::parse(jsonData1.dump());
    std::cout << "DATA1: " << jsonData1.dump() << "\n";
    std::cout << "DATA2: " << jsonData2.dump() << "\n\n";

    for (const auto& v : jsonData1)
        std::cout << ">> is_null=" << v.is_null() << ", is_number=" << v.is_number() << ", value='" << v << "'\n";
    std::cout << "\n";
    for (const auto& v : jsonData2)
        std::cout << ">> is_null=" << v.is_null() << ", is_number=" << v.is_number() << ", value='" << v << "'\n";

    const double v1 = jsonData1.at(1);
    // const double v2 = jsonData2.at(1);  // ERROR: type must be number, but is null

    return 0;
}

Error messages

ERROR: type must be number, but is null

Compiler and operating system

Linux (gcc 12.2), Windows (msvc v19.33)

Library version

3.11.1 or 3.11.2

Validation

[X] The bug also occurs if the latest version from the develop branch is used.
[X] I can successfully compile and run the unit tests.

Oct 19 '22 03:10 phil-zxx

See https://json.nlohmann.me/features/types/number_handling/#nan-handling

Oct 19 '22 04:10 nlohmann

Understood.

However, if we have nlohmann::json obj = {1.72, NaN, null}, then the round-trip

obj == nlohmann::json::parse(obj.dump())

fails because nlohmann::json::parse(obj.dump()) gives {1.72, null, null} instead.

The json strings are equal, but the json objects are different, which seems like an inconsistency.

Oct 19 '22 06:10 phil-zxx

It is behaving exactly as in the linked document:

That is, there is no way to parse a NaN value. However, NaN values can be stored in a JSON value by assignment.

This library serializes NaN values as null. This corresponds to the behavior of JavaScript's JSON.stringify function.

Oct 19 '22 15:10 gregmarr

Hi. I am not trying to debate whether this is documented or not. I am raising the point that the json string & json object are not consistent.

On our side this leads to problems, as null & nan are treated differently in memory, while being the same after we output the data to a json string. So when we load the json data back from a string (or a file), the data in memory is now distinctly different, which breaks some of our workflows.

Oct 20 '22 02:10 phil-zxx

What do you propose?

Oct 20 '22 04:10 nlohmann

Two solutions:

Whenever a C++ double value is passed into the json object, which is NaN, convert it to a true null object. That way, the json string & in-memory json are the same. and/or
Allow null to get cast to NaN when it is explicitly cast to double, e.g. via obj.get<double>().

Oct 20 '22 06:10 phil-zxx

1 is not technically possible in all cases due to the get_ref function which can return a double & and allow the value to be changed without involving the library at all. Also, it would be very surprising (and a breaking change) if assigning NaN silently changed the type of the value. 2 is incorrect as it's not necessarily an NaN that triggers the value to be null. It could just have been a null from the beginning.

Are you not able to keep your workflows from generating NaN values and storing them into the json objects? Could you create your own wrapper function around the double assignment that does the NaN check and converts it to a null assignment?

Oct 20 '22 14:10 gregmarr

Hey, I came across the same case, in this example:

json dict = json::object();

{
    double nan = NAN;
    double val = 123;
    cout << "A nan: " << nan << ", val:" << val << endl;
    dict["nan"] = nan;
    dict["val"] = val;
}

cout << "B dict: " << dict.dump() << endl;
cout << "\tisnan: " << isnan(dict["nan"].get<double>()) << " " << isnan(dict["val"].get<double>()) << endl;
cout << "\t'nan'" << ", is_null: " << dict["nan"].is_null() << ", is_number: " << dict["nan"].is_number() << endl;
cout << "\t'val'" << ", is_null: " << dict["val"].is_null() << ", is_number: " << dict["val"].is_number() << endl;

{
    double nan = dict["nan"].get<double>();
    double val = dict["val"].get<double>();
    cout << "C nan: " << nan << ", val:" << val << endl;
}

dict = json::parse(dict.dump());

cout << "D dict: " << dict.dump() << endl;
cout << "\t'nan'" << ", is_null: " << dict["nan"].is_null() << ", is_number: " << dict["nan"].is_number() << endl;
cout << "\t'val'" << ", is_null: " << dict["val"].is_null() << ", is_number: " << dict["val"].is_number() << endl;

{
    double nan = 0;
    nan = dict["nan"].get<double>(); // <------ json throw exception
    cout << "E nan: " << nan << endl;
}

A nan: nan, val:123
B dict: {"nan":null,"val":123.0}
	isnan: 1 0
	'nan', is_null: 0, is_number: 1
	'val', is_null: 0, is_number: 1
C nan: nan, val:123
D dict: {"nan":null,"val":123.0}
	'nan', is_null: 1, is_number: 0
	'val', is_null: 0, is_number: 1
(lldb)

And I send this json via internet, parse on server side, next dumps to file. At the end server can't open json again.

Third solution (maybe optional by set define?): 3. Throw an exception when setting NaN to json: dict["nan"] = NAN;

Nov 22 '22 23:11 trzeciak

Adding an exception when assigning a NaN to a JSON value has two problems:

It is a breaking change, and we don't want to add those lightheartedly.
It would be wrong to forbid storing NaN, because it is perfectly fine to serialize them in formats like CBOR which do support NaN.

Dec 03 '22 13:12 nlohmann

Any idea to proceed here without adding a breaking change?

Dec 11 '22 09:12 nlohmann

The crux of the problem is that the round-trip const json data2 = json::parse(data1.dump()); creates inconsistent behaviour between data1 and data2. Similarly to trzeciak, I send numerical json data over the wire. So if before & after serialisation, the objects behave differently, it's a big problem. And it's impossible for me to catch or handle every nan.

To recap, so far 3 solutions have been proposed:

Auto-convert NaN to null inside a json object
Allow conversion of null to NaN via get<double>()
Throw an error when NaN is set to a json object

I personally still think solution 2 has merit. Because irrespective of whether a null was originally a true null or a NaN (or something else), converting null to NaN just feels more natural than throwing an error.

Nonetheless, another solution:

On https://json.nlohmann.me/api/basic_json/dump it says "The function tries to mimic Python's json.dumps() function". And on https://docs.python.org/3.10/library/json.html#infinite-and-nan-number-values it says that

[...] By default, this module accepts and outputs Infinity, -Infinity, and NaN as if they were valid JSON number literal values.

So Python supports NaN and it's even enabled by default. We can control this behaviour via the allow_nan parameter. This parameter could be added to nlohmann json's dump & parse function:

string_t dump(int indent, char indent_char, bool ensure_ascii, error_handler_t, bool allow_nan);
basic_json parse(InputType&& i, parser_callback_t cb, bool allow_exceptions, bool ignore_comments, bool allow_nan);

Then we can write const json data2 = json::parse(data1.dump(-1, ' ', false, strict, true), nullptr, true, false, true); and we truly have data1 == data2.

Of course, listing all the parameters is a bit cumbersome. So maybe new overloads or new functions could help:

// New dump overload like string_t dump(nan_policy_t) const
// where nan_policy_t = enum class { allow, forbid };
const json data2 = json::parse(data1.dump(nan_policy_t::allow), nan_policy_t::allow);

// New "..._with_nan" functions
const json data2 = json::parse_with_nan(data1.dump_with_nan());

However, if nlohmann::json's default for allow_nan is true, then this may be a breaking change (again, for some users). But it would follow Python's example (as promised in the docs), and it may remove the need for the additional overloads/functions (as I suppose most people would be happy with nan's being supported by default).

Hope this helps.

Dec 11 '22 16:12 phil-zxx

I also ran into the same issue as OP, as my data contains NaNs that are not easily reducible to nulls. The optional syntax extension proposed by @phil-zxx in the previous post seems like a sensible solution that solves at least my specific problem and is in line with the prior art, even if it is not enabled by default.

EDIT: Enabling NaN support in the parser by default is probably preferred as this change is unlikely to be breaking.

Feb 24 '23 18:02 pavel-kirienko

I'm not sure if I care if its enabled by default, but could we at least add an optional flag to read/write NaN, Infinity, and -Infinity?

Jul 20 '23 14:07 bredelings

As for now, the library will not support NaN values.

Sep 23 '23 16:09 nlohmann

Note that Boost.JSON has decided to write Inf and -Inf as 1e99999 and -1e99999, which seems to handle Inf without violating the JSON spec.

https://github.com/boostorg/json/issues/397

They still write NaN as null, but have a flag for selecting the desired behavior.

Sep 23 '23 16:09 bredelings

json json copied to clipboard

Inconsistent Behaviour of NaN & Null Values

Description

Reproduction steps

Expected vs. actual results

Minimal code example

Error messages

Compiler and operating system

Library version

Validation

json
json copied to clipboard