json
json copied to clipboard
Inconsistent Behaviour of NaN & Null Values
Description
Let us assume we have nan = std::numeric_limits<double>::quiet_NaN() and null = nlohmann::json() in our json object. Then the first is a number and NOT null. Whereas the second is null and NOT a number. However, in string form, both appear as null.
So when dumping everything to a json string and then parsing it back into a json object, then both values have now become null. And neither value is a number. So when we try to convert any of them back to a double we get type must be number, but is null.
Reproduction steps
See Minimal code example or https://godbolt.org/z/b9n7Eq9qv
Expected vs. actual results
Both std::numeric_limits<double>::quiet_NaN() and nlohmann::json() are displayed as null in a json file. So a nlohmann::json object should treat both the same at any given point in time. Two solutions:
- Either immediately convert
std::numeric_limits<double>::quiet_NaN()to a truenullobject (which is not a number). So that json in memory is consistent with its json string representation. - Or allow
nullto get cast toNaNwhen converting todouble.
Since NaN serializes to null, I expect that when I convert a json null object to a double to get NaN. That is, solution 2.
Minimal code example
#include <nlohmann/json.hpp>
#include <iostream>
int main()
{
using json = nlohmann::json;
const double NaN = std::numeric_limits<double>::quiet_NaN();
json jsonData1 = {1.72, NaN, json()};
const json jsonData2 = json::parse(jsonData1.dump());
std::cout << "DATA1: " << jsonData1.dump() << "\n";
std::cout << "DATA2: " << jsonData2.dump() << "\n\n";
for (const auto& v : jsonData1)
std::cout << ">> is_null=" << v.is_null() << ", is_number=" << v.is_number() << ", value='" << v << "'\n";
std::cout << "\n";
for (const auto& v : jsonData2)
std::cout << ">> is_null=" << v.is_null() << ", is_number=" << v.is_number() << ", value='" << v << "'\n";
const double v1 = jsonData1.at(1);
// const double v2 = jsonData2.at(1); // ERROR: type must be number, but is null
return 0;
}
Error messages
ERROR: type must be number, but is null
Compiler and operating system
Linux (gcc 12.2), Windows (msvc v19.33)
Library version
3.11.1 or 3.11.2
Validation
- [X] The bug also occurs if the latest version from the
developbranch is used. - [X] I can successfully compile and run the unit tests.
See https://json.nlohmann.me/features/types/number_handling/#nan-handling
Understood.
However, if we have nlohmann::json obj = {1.72, NaN, null}, then the round-trip
obj == nlohmann::json::parse(obj.dump())
fails because nlohmann::json::parse(obj.dump()) gives {1.72, null, null} instead.
The json strings are equal, but the json objects are different, which seems like an inconsistency.
It is behaving exactly as in the linked document:
That is, there is no way to parse a NaN value. However, NaN values can be stored in a JSON value by assignment.
This library serializes NaN values as null. This corresponds to the behavior of JavaScript's JSON.stringify function.
Hi. I am not trying to debate whether this is documented or not. I am raising the point that the json string & json object are not consistent.
On our side this leads to problems, as null & nan are treated differently in memory, while being the same after we output the data to a json string. So when we load the json data back from a string (or a file), the data in memory is now distinctly different, which breaks some of our workflows.
What do you propose?
Two solutions:
- Whenever a C++ double value is passed into the json object, which is
NaN, convert it to a truenullobject. That way, the json string & in-memory json are the same. and/or - Allow
nullto get cast toNaNwhen it is explicitly cast to double, e.g. viaobj.get<double>().
1 is not technically possible in all cases due to the get_ref function which can return a double & and allow the value to be changed without involving the library at all. Also, it would be very surprising (and a breaking change) if assigning NaN silently changed the type of the value.
2 is incorrect as it's not necessarily an NaN that triggers the value to be null. It could just have been a null from the beginning.
Are you not able to keep your workflows from generating NaN values and storing them into the json objects? Could you create your own wrapper function around the double assignment that does the NaN check and converts it to a null assignment?
Hey, I came across the same case, in this example:
json dict = json::object();
{
double nan = NAN;
double val = 123;
cout << "A nan: " << nan << ", val:" << val << endl;
dict["nan"] = nan;
dict["val"] = val;
}
cout << "B dict: " << dict.dump() << endl;
cout << "\tisnan: " << isnan(dict["nan"].get<double>()) << " " << isnan(dict["val"].get<double>()) << endl;
cout << "\t'nan'" << ", is_null: " << dict["nan"].is_null() << ", is_number: " << dict["nan"].is_number() << endl;
cout << "\t'val'" << ", is_null: " << dict["val"].is_null() << ", is_number: " << dict["val"].is_number() << endl;
{
double nan = dict["nan"].get<double>();
double val = dict["val"].get<double>();
cout << "C nan: " << nan << ", val:" << val << endl;
}
dict = json::parse(dict.dump());
cout << "D dict: " << dict.dump() << endl;
cout << "\t'nan'" << ", is_null: " << dict["nan"].is_null() << ", is_number: " << dict["nan"].is_number() << endl;
cout << "\t'val'" << ", is_null: " << dict["val"].is_null() << ", is_number: " << dict["val"].is_number() << endl;
{
double nan = 0;
nan = dict["nan"].get<double>(); // <------ json throw exception
cout << "E nan: " << nan << endl;
}
A nan: nan, val:123
B dict: {"nan":null,"val":123.0}
isnan: 1 0
'nan', is_null: 0, is_number: 1
'val', is_null: 0, is_number: 1
C nan: nan, val:123
D dict: {"nan":null,"val":123.0}
'nan', is_null: 1, is_number: 0
'val', is_null: 0, is_number: 1
(lldb)
And I send this json via internet, parse on server side, next dumps to file. At the end server can't open json again.
Third solution (maybe optional by set define?):
3. Throw an exception when setting NaN to json: dict["nan"] = NAN;
Adding an exception when assigning a NaN to a JSON value has two problems:
- It is a breaking change, and we don't want to add those lightheartedly.
- It would be wrong to forbid storing
NaN, because it is perfectly fine to serialize them in formats like CBOR which do supportNaN.
Any idea to proceed here without adding a breaking change?
The crux of the problem is that the round-trip const json data2 = json::parse(data1.dump()); creates inconsistent behaviour between data1 and data2. Similarly to trzeciak, I send numerical json data over the wire. So if before & after serialisation, the objects behave differently, it's a big problem. And it's impossible for me to catch or handle every nan.
To recap, so far 3 solutions have been proposed:
- Auto-convert
NaNtonullinside a json object - Allow conversion of
nulltoNaNviaget<double>() - Throw an error when
NaNis set to a json object
I personally still think solution 2 has merit. Because irrespective of whether a null was originally a true null or a NaN (or something else), converting null to NaN just feels more natural than throwing an error.
Nonetheless, another solution:
- On https://json.nlohmann.me/api/basic_json/dump it says "The function tries to mimic Python's json.dumps() function". And on https://docs.python.org/3.10/library/json.html#infinite-and-nan-number-values it says that
[...] By default, this module accepts and outputs
Infinity,-Infinity, andNaNas if they were valid JSON number literal values.
So Python supports NaN and it's even enabled by default. We can control this behaviour via the allow_nan parameter. This parameter could be added to nlohmann json's dump & parse function:
string_t dump(int indent, char indent_char, bool ensure_ascii, error_handler_t, bool allow_nan);
basic_json parse(InputType&& i, parser_callback_t cb, bool allow_exceptions, bool ignore_comments, bool allow_nan);
Then we can write const json data2 = json::parse(data1.dump(-1, ' ', false, strict, true), nullptr, true, false, true); and we truly have data1 == data2.
Of course, listing all the parameters is a bit cumbersome. So maybe new overloads or new functions could help:
// New dump overload like string_t dump(nan_policy_t) const
// where nan_policy_t = enum class { allow, forbid };
const json data2 = json::parse(data1.dump(nan_policy_t::allow), nan_policy_t::allow);
// New "..._with_nan" functions
const json data2 = json::parse_with_nan(data1.dump_with_nan());
However, if nlohmann::json's default for allow_nan is true, then this may be a breaking change (again, for some users). But it would follow Python's example (as promised in the docs), and it may remove the need for the additional overloads/functions (as I suppose most people would be happy with nan's being supported by default).
Hope this helps.
I also ran into the same issue as OP, as my data contains NaNs that are not easily reducible to nulls. The optional syntax extension proposed by @phil-zxx in the previous post seems like a sensible solution that solves at least my specific problem and is in line with the prior art, even if it is not enabled by default.
EDIT: Enabling NaN support in the parser by default is probably preferred as this change is unlikely to be breaking.
I'm not sure if I care if its enabled by default, but could we at least add an optional flag to read/write NaN, Infinity, and -Infinity?
As for now, the library will not support NaN values.
Note that Boost.JSON has decided to write Inf and -Inf as 1e99999 and -1e99999, which seems to handle Inf without violating the JSON spec.
https://github.com/boostorg/json/issues/397
They still write NaN as null, but have a flag for selecting the desired behavior.