nix icon indicating copy to clipboard operation
nix copied to clipboard

toJSON truncates floats

Open pennae opened this issue 3 years ago • 2 comments

Describe the bug

toJSON unnecessarily truncates floats to six significant figures.

Steps To Reproduce

nix-repl> builtins.toJSON 1.000001
"1"

nix-repl> builtins.toJSON 1048576.1
"1.04858e+06"

Expected behavior

nix-repl> builtins.toJSON 1.000001
"1.000001"

nix-repl> builtins.toJSON 1048576.1
"1048576.1"

nix-env --version output

tested 2.4 and 2.5pre20211126_55275fc

Additional context

found while investigating https://github.com/NixOS/nixpkgs/issues/148824

pennae avatar Dec 06 '21 02:12 pennae

This is the place https://github.com/NixOS/nix/blob/master/src/libutil/json.cc#L34 Could be fixed like this str << std::setprecision(9) << n; but maybe it's better to use fmt or boost formatting to output floating point numbers 🤔

mopleen avatar Dec 07 '21 09:12 mopleen

this also affect toString in a different way. toString always produces six fractional digits, rounding small floats like 1e-9 to 0.

pennae avatar Mar 11 '22 23:03 pennae

This can't be changed, because it would change hashes across Nix versions.

flokli avatar Jan 25 '23 17:01 flokli

According to my bisect, this was "fixed" in https://github.com/NixOS/nix/pull/7313 despite the hash changes. We had some scary math issues as a result.

srhb avatar Apr 25 '23 09:04 srhb

This can't be changed, because it would change hashes across Nix versions.

I'd like a more configurable toJSON2 primop that allows the precision to be extended, or optionally disables the collapsing of "outPath" fields.

roberth avatar Apr 30 '23 11:04 roberth

Displaying floats in decimal is apparently pretty complicated science, and there's multiple competing algorithms to come up with the representation, and even more ways to format it.

It's not entirely well-defined how Nix does it either. If you're curious, check the commit message of https://cs.tvl.fyi/depot/-/commit/1facd889bba724cf20ea14422ee1e57440b3e761, which describes the challenges while trying to mimic Nix as much as possible.

I personally think supporting floats to be printed as strings (and by this, allowing that representation to be leaked into output hashing) was a mistake.

I don't think we should add more builtins treating this differently, or allowing to configure the behavior, but rather discourage the usage of floats in output alltogether. We can't remove it from the language, (even though nixpkgs uses it very little), but we at least could print warnings whenever someone converts a float to string, so they know they produce output that might differ across implementations / architectures / compiler versions.

flokli avatar Apr 30 '23 11:04 flokli

I agree. Until we explicitly formalize and test the semantics of non-integral numbers in the Nix language, the current implementation ("whatever some platform code does") should be avoided. Such a project should also include the removal of native floating point operations

  • #8274 It's a non-trivial project to add proper "float" support to Nix.

Nixpkgs/NixOS could sidestep the Nix floats by postprocessing something like mkFloat = value: { _type = "float"; inherit value;} etc. It already postprocesses the toJSON output in a derivation (in pkgs.formats), so that seems within reason. That also solves the precision limitations, as long as users don't need to compute with floats.

roberth avatar Apr 30 '23 12:04 roberth

Some prior art for deterministically specifying float ↔ string conversions can be found in the ECMAScript specification: Number::toString, StringNumericValue. For historical reasons, that specification has two points where an implementation-defined choice can be made:

  • In Number::toString step 5, the implementation may choose between multiple representations of the shortest length that all round-trip to the input. A deterministic specification of the best choice is provided in Note 2.

  • In the RoundMVResult helper for StringNumericValue, the implementation may round excessively-precise inputs up or down to 20 significant digits. A deterministic specification would omit this allowance and use the input as provided.

The Ryū, Grisu-Exact, and Dragonbox algorithms all satisfy this stricter deterministic specification; their difference is in performance and not observable rounding behavior.

andersk avatar Sep 10 '23 07:09 andersk