json icon indicating copy to clipboard operation
json copied to clipboard

make number serialzation precision configurable

Open maddanio opened this issue 11 months ago • 12 comments

This PR adds the possibility to control floating point serialization precision via std::setprecision. I also have a version with an explicit annotator instead if you absolutely want to avoid behaviour changes (as users may have inadvertedly modified the iostate in their code right now), but I find this way much more natural (I was quite suprised that std::setprecision had no effect).


Pull request checklist

Read the Contribution Guidelines for detailed information.

  • [x] Changes are described in the pull request, or an existing issue is referenced.
  • [x] The test suite compiles and runs without error.
  • [x] Code coverage is 100%. Test cases can be added by editing the test suite.
  • [x] The source code is amalgamated; that is, after making changes to the sources in the include/nlohmann directory, run make amalgamate to create the single-header files single_include/nlohmann/json.hpp and single_include/nlohmann/json_fwd.hpp. The whole process is described here.

maddanio avatar Jan 09 '25 14:01 maddanio

🔴 Amalgamation check failed! 🔴

The source code has not been amalgamated. @maddanio Please read and follow the Contribution Guidelines.

github-actions[bot] avatar Jan 09 '25 14:01 github-actions[bot]

🔴 Amalgamation check failed! 🔴

The source code has not been amalgamated. @maddanio Please read and follow the Contribution Guidelines.

github-actions[bot] avatar Jan 09 '25 14:01 github-actions[bot]

🔴 Amalgamation check failed! 🔴

The source code has not been amalgamated. @maddanio Please read and follow the Contribution Guidelines.

github-actions[bot] avatar Jan 09 '25 15:01 github-actions[bot]

thanks for looking at this. so does that mean this PR will get merged when I address all of the requests? I was submitting it a bit speculatively and dont want to sink work if thats not the case or more principal discussion is needed first

maddanio avatar Jan 09 '25 19:01 maddanio

@nlohmann has the final say here on whether this can go forward.

Prior to that, I think it would be good to define what exactly precision means here. Is it the maximum number of digits after the decimal point, or the exact number? Having it mean exact is more involved, as it can change a current 0.0 to 0.000000. This should also define if it is going to use truncation, or rounding. That is, is PI with precision 3 3.142 or 3.141. Similarly, is 1.999999999 to any precision less than 9 just 2.0?

One reason I recommended using -1 as an override is that there is more here than just shortening the number. If a precision isn't defined, everything just behave exactly as it is now. If it is defined, then you have to look at rounding, and thus you may not be able to use Grisu2. I haven't looked into it enough to know if you can implement a rounding precision parameter in Grisu2.

Then the next question is whether std::setprecision affects this, or if it's only configured via parameters to dump and if that means that it's time to create a configuration method for dump. That is related to the question of whether or not it would affect the ostream << json operators. As mentioned in the initial comment, having std::setprecision affect either of these things would be a breaking change.

gregmarr avatar Jan 09 '25 20:01 gregmarr

About the maximum or exact and truncation vs rounding:

    std::cout << std::setprecision(3) << M_PI << "\n" << 1.9999 << "\n";

This prints

3.142
2

https://www.godbolt.org/z/8Mne8Y59M

gregmarr avatar Jan 09 '25 21:01 gregmarr

🔴 Amalgamation check failed! 🔴

The source code has not been amalgamated. @maddanio Please read and follow the Contribution Guidelines.

github-actions[bot] avatar Jan 09 '25 21:01 github-actions[bot]

I am very hesitant, and these are just my thoughts right now:

  • I personally don't think the serialization is broken. It does what it should, and the reason we chose the current precision is to enable roundtripping.
  • A lot of people are offended by the way the library does serialization. Partly because they don't understand how floating-point numbers work. How can a library change their 1.9 to 1.89999999999999991118?!
  • The other part are concerned about bandwidth and want to limit the number of bytes used to write π to the wire.
  • Again others want to control more aspects of the output, like where \n are added, how content is aligned, etc.
  • All this can be implemented somehow, and it's never about "What C++ code is needed to realize this problem?" but more "How can we put this into an API that people want to use, and other people want to maintain in their free time."
  • The questions @gregmarr raises are very valid, and I think there is no single right answer to them. If we define precision = "maximal number of digits after the . using truncation rather than rounding", then there will be a feature request later this year asking to change any other point on the list.
  • Also, the library is frequently criticized for its performance. Adding all these extension hooks does not really improve this. Also, the other JSON libraries seem not to care about this aspect (or their users don't ask for it). Performance should not be the "no" to all feature requests, but should definitely be on the list of concerns.
  • It's hard to find the right answer to all this, because in the end, the library is targeting developers, not just myself.

Sorry for this rough brain-dump. I will not be able to make any decision right now. But I think I will not add this without having a clear plan how serialization can be improved conceptually before adding some more ifs here and there.

nlohmann avatar Jan 09 '25 21:01 nlohmann

I forgot one thing that other libraries do that avoids some of the confusion I see in issues here: they store the original input string and only convert it to a number on demand. As a result, parsing is cheaper and you can create the exact serialization of the input. Not sure if we want to do this here, but a "precision = keep whatever precision I put in" requirement is also thinkable.

nlohmann avatar Jan 09 '25 21:01 nlohmann

I dont think performance difference is measurable, the only action is changing some pointers and doing some comparisons. if any I would surmise performance improves with lower precision.

yes, it is truncating, so if you want rounding that would be more involved. but if desired I would be willing to dig in.

all in all we in my company really want this, because we have some rather large sets of numbers in json, and this saves us a rather large amount of space and noise. but if you don't think this is what the library needs we will just continue on our fork. and I fully understand how float works, but I also know in many cases the number of digits I really need, and what is just noise (in physics when we calculated stuff we rarely kept more than 3 significant digits :))

maddanio avatar Jan 10 '25 08:01 maddanio

🔴 Amalgamation check failed! 🔴

The source code has not been amalgamated. @maddanio Please read and follow the Contribution Guidelines.

github-actions[bot] avatar Jan 10 '25 08:01 github-actions[bot]

Several years ago I wrote an app which stored soma data in JSON file (using this library) and hashed part of this file by using a crypto hash function. The hashed part was a JSON serialization using a "canonical" format (i.e. no formatting, indentation, or white spaces). It worked because of the aforementioned feature of roundtripping.

Then I learned the hard way, when using Python JSON implementation on the very same data that nlohmann::json produced output that the Python module was not able to "roundtrip". Apparently nlohmann::json uses a different rounding rules than Python native json module, so some numbers got the last precision bit flipped when read by Python and then serialized again. This would not make a big deal mathematically wise, but made a big deal if the text representation was an input to a hashing function.

At the end I had to revert to writing a custom float parser in Python's simplejson which preserved the textual representation of the JSON number (i.e. floats) and used this for hashing instead of roundtripping the float value. All this just to say that even when one library is able to roundtrip, it does not guarantee an interoperability with the other. Another implication is that if the library changes the way it works with "floats" it will break scenarios like the one described above (even some would argue it is a poor design).

My understanding of what @maddanio suggests is not in fact changing "precision" (in particular because the proposal does not care about the rounding), but formatting (representation). I.e. being able to control the length of the textual representation.

I believe it would be good if the library supported a way to write custom serializer/deserializer the similar way it supports custom conversions for e.g. JSON number. This way whoever wanted to write in a particular format would just supply its own (de)serializer without a need to fork the project or impose the required behavior on the library.

risa2000 avatar Feb 02 '25 20:02 risa2000