Nix language: inconsistent handling of numbers
Describe the bug
After noticing #12848 I started to investigate other strange corner cases of the Nix language related to numbers. This issue lists several corner cases, which should be either documented or fixed.
Handling of 0.0
1.
1. is parsed as 1.0 (here, I'm adding the point to indicate a Nix float), while 0. isn't parsed, i.e.
$ nix eval --expr '0.'
error: syntax error, unexpected end of file, expecting ID or OR_KW or DOLLAR_CURLY or '"'
at «string»:1:3:
1| 0.
| ^
Expected behavior: 0. should be parsed as 0.0
2.
1 / 0.0 throws an error instead of returning inf:
$ nix eval --expr '1 / 0.0'
error:
… while calling the 'div' builtin
at «string»:1:3:
1| 1 / 0.0
| ^
error: division by zero
Computing 1. / (1. / x) throws an error, if x = inf or x = -inf.
inf can be generated easily:
$ nix eval --expr '1.0e200 * 1.0e200'
inf
3.
-0.0 is parsed as 0.0. This is not nice because -0.0 can be generated easily, i.e.
$ nix eval --expr '1.0e-200 * -1.0e-200'
-0
Probably this cannot cause unexpected behavior at the moment because division by 0.0 throws an error.
Nevertheless I think that it makes sense to have consistent IEEE754 doubles.
Handling of subnormal numbers
Parsing subnormal numbers throws an error, e.g.
$ nix eval --expr '1.0e-310'
error: invalid float '1.0e-310'
at «string»:1:1:
1| 1.0e-310
| ^
but they can be generated easily, e.g.
$ nix eval --expr '1.0e-200 * 1.0e-110'
1e-310
The question is: Which behavior is appropriate?
- Shall subnormal numbers be parsed normally?
- Shall subnormal numbers be flushed to zero?
- Shall subnormal numbers throw an error?
builtins.ceil and builtins.floor are broken
When applying these functions to a Nix float, which is outside of the range of the Nix integer type, then -9223372036854775808 = INT64_MIN = -2^63 is returned, e.g.
$ nix eval --expr 'builtins.ceil 1.0e200'
-9223372036854775808
$ nix eval --expr 'builtins.floor 1.0e200'
-9223372036854775808
This makes no sense because the next smaller integer would be INT64_MAX = 2^63 - 1 = 9223372036854775807 for builtins.floor. Probably builtins.ceil should throw an error instead because there is no integer greater or equal to 1e200, if the return type shall be an integer.
Applying these functions to inf, -inf or NaN returns BS, too:
$ nix eval --expr 'builtins.floor (1.0e200 * 1.0e200)'
-9223372036854775808
$ nix eval --expr 'builtins.floor (1.0e200 * 1.0e200 - 1.0e200 * 1.0e200)'
-9223372036854775808
~~This should throw an error instead.~~ See below.
The next question is why these functions don't return a Nix float. ceil and floor in other languages don't return an integer type.
EDIT: After analyzing the source code, these cases are UB because a double is casted to int64 and the observed behavior is probably specific to x86-64. This shall be fixed.
Throwing an error when the argument is outside of the range of integers or NaN is one solution. Then | floor(x) - x | < 1 would hold for every Nix float x (ceil is analogous).
Or the behavior can be saturating, i.e. floor(x) = 2^63-1 for all Nix floats x >= 2^63. The drawback is that | floor(x) - x | wouldn't be bounded anymore.
Metadata
nix (Nix) 2.26.2
Additional context
Checklist
- [ ] checked latest Nix manual (source)
- [ ] checked open bug issues and pull requests for possible duplicates
Add :+1: to issues you find important.
0.should be parsed as0.0
That may be possible.
Computing
1. / (1. / x)throws an error, ifx = inforx = -inf.
I have no idea which behavior is preferable. Nix isn't meant for floating point computations (and I'll shut up about the floats feature being a mistake)
-0.0is parsed as0.0. This is not nice because-0.0can be generated easily, i.e.
Fixing this is technically a breaking change, but if you write -0.0 in an expression, you're probably not writing a serious package. Changing a config could be a problem though.
Not sure.
We could make it an error for a year so that at least users are aware of the impending breakage.
- subnormal numbers
Don't know what to think of this.
builtins.ceilandbuiltins.floorare broken
Or the behavior can be saturating
Better to throw an error than be sneaky.
The drawback is that
| floor(x) - x |wouldn't be bounded anymore.
This is already a problem for numbers within the 64-bit range if you make the additional assumption that the bounds are reasonable.
nix-repl> 123456789012345678 - builtins.floor 123456789012345678
-2
This is already a problem for numbers within the 64-bit range if you make the additional assumption that the bounds are reasonable.
nix-repl> 123456789012345678 - builtins.floor 123456789012345678 -2
This is another problem... floor and ceil are casting all inputs to double. double has only a 52-bit mantissa with one implied 1 bit, i.e. every integer with an absolute value greater than 2^53 could be rounded, when it is converted to a double. This explains your example. Though the rounding error is bounded by 2^10 or 2^9 depending on the rounding mode.
The right behavior would be to pass through all Nix integers, if the functions shall return a Nix integer.
Another problem when rounding is the rounding mode. Here, we can just hope that it is "round to nearest even" on all architectures and that no library or other part of the Nix binary changes it because it is global state.
-0.0is parsed as0.0. This is not nice because-0.0can be generated easily, i.e.Fixing this is technically a breaking change, but if you write
-0.0in an expression, you're probably not writing a serious package. Changing a config could be a problem though. Not sure. We could make it an error for a year so that at least users are aware of the impending breakage.
I don't think that this is a breaking change because 0.0 compares equal to -0.0. According to my analysis this can only affect serializations of a Nix float like builtins.toString ("0.000000" vs. "-0.000000") because the only Nix function, where a signed zero could introduce a different result is in the second argument of bultins.div (divisor), but division by zero is an error at the moment.
According to my analysis this can only affect serializations of a Nix float like
builtins.toString("0.000000"vs."-0.000000")
We kind of have to assume that ends up in a derivation though. We've made that mistake, and I would not like to repeat it.
On a related note: You cannot represent int64 min in nix as a literal. My guess is that the parser only parses the positive part and then negates the result by applying an operation instead of parsing the number directly:
nix-repl> -9223372036854775808
error: invalid integer '9223372036854775808'
at «string»:1:2:
1| -9223372036854775808
| ^
nix-repl> -9223372036854775807 - 1
-9223372036854775808
One more "interesting" case is the canonical NaN representation. On x86 it is negative, but all other platforms produce positive NaNs on invalid operations. Evaluating the following:
let
a = 1000000000000000000000000000000000000000000000000000000000000.00000000000;
inf = a * a * a * a * a * a;
in
{
nan = (inf / inf);
}
On x86_64 results in:
$ nix-instantiate --eval test.nix -A nan
-nan
But for aarch64-linux:
$ qemu-aarch64 result/bin/nix-instantiate --eval test.nix -A nan
nan
One more "interesting" case is the canonical NaN representation. On x86 it is negative, but all other platforms produce positive NaNs on invalid operations.
The same seems to apply to builtins.toXML.
builtins.toJSON is funny, too. It returns "null". I think that it should throw an error instead because NaN cannot be represented as JSON such that it would be round trip safe.
Seems like the only way to salvage the mess that is IEEE 754 implementations is to use a softfloat library configured for a single architecture.
There is no need for a soft float implementation. The only issue is that many representations of NaN exist. The representation should not matter as long as computed NaN values are serialized consistently (a special requirement of Nix), e.g. as "nan", i.e. when a builtin returns a NaN, then this value should be replaced by a specific representation.
Serializing all NaNs as "nan" is a possible solution, but this might hurt round trip safety, e.g. builtins.fromTOML can load NaN values:
nix-repl> builtins.fromTOML "a = -nan"
{ a = -nan; }
nix-repl> builtins.fromTOML "a = nan"
{ a = nan; }
Thus preserving the sign bit makes sense in this case.
only issue is that many representations of NaN exist.
Yeah, for one we'll at least only have to deal with quiet NaNs. Preserving just the sign for those may be good. But what to do about canonical NaNs? Do we say that the canonical NaN for Nix is positive and wrap all operations on x86? That's the only way we'd be able to make eval consistent and preserve the signs.
Either each float result of a computation needs to be checked (I prefer to do this on all platforms because NaN propagation is undefined anyhow, e.g. when applying a binary operator to two NaN operands) and if the result is a NaN, then a canonical NaN representation can be returned instead, i.e. a NaN with cleared sign bit. Not all NaNs would be canonical because builtins.fromTOML can create non-canonical NaNs.
Or the simplest solution: Introduce a total order to NixFloat by converting all NaNs into errors, i.e. throw an exception in mkFloat() with proper error message.
Introduce a total order to NixFloat by converting all NaNs into errors
I think that ship has sailed at this point. And it's not like it's UB either, that's just implementation defined.
Canonicalising NaNs generated by arithmetic operations on the other hand sounds good. Since now it's totally scuffed and leads to platform-dependent eval it's a necessary bugfix.
The question would be: which sign do we choose to minimise the fallout? Going with the negative sign probably represents more users, since that's what x86 does. On the other hand positive canonical NaNs are more widespread (by architectures) and used by ARM, RISC-V and MIPS, probably PPC as well.
Another nix bug:
(1.e200 * 1.e200 - 1.e200 * 1.e200) <= 0
(1.e200 * 1.e200 - 1.e200 * 1.e200) >= 0
both return true, but the LHS is NaN.
A difference between libstdc++ and libc++ when parsing subnormal numbers with builtins.fromTOML:
No problem with libstdc++
nix-repl> builtins.fromTOML ''a = 1e-308''
{ a = 1e-308; }
but with libc++ an error is thrown
nix-repl> builtins.fromTOML ''a = 1e-308''
error:
… while calling the 'fromTOML' builtin
at «string»:1:1:
1| builtins.fromTOML ''a = 1e-308''
| ^
error: while parsing TOML: [error] toml::parse_floating: failed to read floating point value from stream
--> fromTOML
|
1 | a = 1e-308
| ^-- here
A difference between libstdc++ and libc++ when parsing subnormal numbers with builtins.fromTOML
🫠🫠🫠🫠🫠🫠🫠
As if we haven't had enough TOML fun yet. This is a gift that keeps on giving
both return
true, but the LHS isNaN.
Quite obvious when looking at code. Since a <= b gets desugared into !(b < a) and all comparsions with NaNs return false. This is a dumpster fire actually and it's much worse than that. Nix floats are so far away from being IEEE754 compliant...
It's great that:
let
nan = (1.e200 * 1.e200 - 1.e200 * 1.e200);
in
{
what = nan <= nan;
the = nan == nan;
heck = nan > nan;
frick = nan >= nan;
omg = nan < nan;
}
Outputs:
{ frick = true; heck = false; omg = false; the = false; what = true; }
A difference between libstdc++ and libc++ when parsing subnormal numbers with builtins.fromTOML
🫠🫠🫠🫠🫠🫠🫠
As if we haven't had enough TOML fun yet. This is a gift that keeps on giving
This can be fixed by letting forceFloat() (is this necessary?) and mkFloat() throw whenever the input is a subnormal number. Similarly, I would check for NaN and inf because inf isn't JSON round trip safe (NaN handling is broken obviously).
std::regex is funny, too. "Sadly" we couldn't find any expression yesterday, which works with both std libraries and returns different results.
std::regex is funny, too. "Sadly" we couldn't find any expression yesterday, which works with both std libraries and returns different results.
Lix has a bunch of those examples actually... It's very cursed
Briefly discussed during today's meeting:
Some things we could do (in arbitrary order):
- Test current slightly cursed behaviour so it doesn't get changed accidentally.
- Canonicalize NaNs to avoid platform specific NaN sign. (When an invalid operations as specified by IEEE 754 occurs a canonical Nix NaN is produced). Needs a decision on the sign of the canonical NaN (either negative or positive).
- Configurable warnings/diagnostics for potentially dubious scenarios that might introduce eval impurities.