nix icon indicating copy to clipboard operation
nix copied to clipboard

Nix language: inconsistent handling of numbers

Open NaN-git opened this issue 9 months ago • 19 comments

Describe the bug

After noticing #12848 I started to investigate other strange corner cases of the Nix language related to numbers. This issue lists several corner cases, which should be either documented or fixed.

Handling of 0.0

1.

1. is parsed as 1.0 (here, I'm adding the point to indicate a Nix float), while 0. isn't parsed, i.e.

$ nix eval --expr '0.'
error: syntax error, unexpected end of file, expecting ID or OR_KW or DOLLAR_CURLY or '"'
       at «string»:1:3:
            1| 0.
             |   ^

Expected behavior: 0. should be parsed as 0.0

2.

1 / 0.0 throws an error instead of returning inf:

$ nix eval --expr '1 / 0.0'
error:
       … while calling the 'div' builtin
         at «string»:1:3:
            1| 1 / 0.0
             |   ^

       error: division by zero

Computing 1. / (1. / x) throws an error, if x = inf or x = -inf. inf can be generated easily:

$ nix eval --expr '1.0e200 * 1.0e200'
inf

3.

-0.0 is parsed as 0.0. This is not nice because -0.0 can be generated easily, i.e.

$ nix eval --expr '1.0e-200 * -1.0e-200'
-0

Probably this cannot cause unexpected behavior at the moment because division by 0.0 throws an error. Nevertheless I think that it makes sense to have consistent IEEE754 doubles.

Handling of subnormal numbers

Parsing subnormal numbers throws an error, e.g.

$ nix eval --expr '1.0e-310'
error: invalid float '1.0e-310'
       at «string»:1:1:
            1| 1.0e-310
             | ^

but they can be generated easily, e.g.

$ nix eval --expr '1.0e-200 * 1.0e-110'
1e-310

The question is: Which behavior is appropriate?

  • Shall subnormal numbers be parsed normally?
  • Shall subnormal numbers be flushed to zero?
  • Shall subnormal numbers throw an error?

builtins.ceil and builtins.floor are broken

When applying these functions to a Nix float, which is outside of the range of the Nix integer type, then -9223372036854775808 = INT64_MIN = -2^63 is returned, e.g.

$ nix eval --expr 'builtins.ceil 1.0e200'
-9223372036854775808
$ nix eval --expr 'builtins.floor 1.0e200'
-9223372036854775808

This makes no sense because the next smaller integer would be INT64_MAX = 2^63 - 1 = 9223372036854775807 for builtins.floor. Probably builtins.ceil should throw an error instead because there is no integer greater or equal to 1e200, if the return type shall be an integer.

Applying these functions to inf, -inf or NaN returns BS, too:

$ nix eval --expr 'builtins.floor (1.0e200 * 1.0e200)'
-9223372036854775808
$ nix eval --expr 'builtins.floor (1.0e200 * 1.0e200 - 1.0e200 * 1.0e200)'
-9223372036854775808

~~This should throw an error instead.~~ See below.

The next question is why these functions don't return a Nix float. ceil and floor in other languages don't return an integer type.

EDIT: After analyzing the source code, these cases are UB because a double is casted to int64 and the observed behavior is probably specific to x86-64. This shall be fixed.

Throwing an error when the argument is outside of the range of integers or NaN is one solution. Then | floor(x) - x | < 1 would hold for every Nix float x (ceil is analogous). Or the behavior can be saturating, i.e. floor(x) = 2^63-1 for all Nix floats x >= 2^63. The drawback is that | floor(x) - x | wouldn't be bounded anymore.

Metadata

nix (Nix) 2.26.2

Additional context

Checklist


Add :+1: to issues you find important.

NaN-git avatar Apr 02 '25 21:04 NaN-git

0. should be parsed as 0.0

That may be possible.

Computing 1. / (1. / x) throws an error, if x = inf or x = -inf.

I have no idea which behavior is preferable. Nix isn't meant for floating point computations (and I'll shut up about the floats feature being a mistake)

-0.0 is parsed as 0.0. This is not nice because -0.0 can be generated easily, i.e.

Fixing this is technically a breaking change, but if you write -0.0 in an expression, you're probably not writing a serious package. Changing a config could be a problem though. Not sure. We could make it an error for a year so that at least users are aware of the impending breakage.

  • subnormal numbers

Don't know what to think of this.

builtins.ceil and builtins.floor are broken

Or the behavior can be saturating

Better to throw an error than be sneaky.

The drawback is that | floor(x) - x | wouldn't be bounded anymore.

This is already a problem for numbers within the 64-bit range if you make the additional assumption that the bounds are reasonable.

nix-repl> 123456789012345678 - builtins.floor 123456789012345678  
-2

roberth avatar Apr 07 '25 11:04 roberth

This is already a problem for numbers within the 64-bit range if you make the additional assumption that the bounds are reasonable.

nix-repl> 123456789012345678 - builtins.floor 123456789012345678  
-2

This is another problem... floor and ceil are casting all inputs to double. double has only a 52-bit mantissa with one implied 1 bit, i.e. every integer with an absolute value greater than 2^53 could be rounded, when it is converted to a double. This explains your example. Though the rounding error is bounded by 2^10 or 2^9 depending on the rounding mode.

The right behavior would be to pass through all Nix integers, if the functions shall return a Nix integer.

Another problem when rounding is the rounding mode. Here, we can just hope that it is "round to nearest even" on all architectures and that no library or other part of the Nix binary changes it because it is global state.

NaN-git avatar Apr 07 '25 12:04 NaN-git

-0.0 is parsed as 0.0. This is not nice because -0.0 can be generated easily, i.e.

Fixing this is technically a breaking change, but if you write -0.0 in an expression, you're probably not writing a serious package. Changing a config could be a problem though. Not sure. We could make it an error for a year so that at least users are aware of the impending breakage.

I don't think that this is a breaking change because 0.0 compares equal to -0.0. According to my analysis this can only affect serializations of a Nix float like builtins.toString ("0.000000" vs. "-0.000000") because the only Nix function, where a signed zero could introduce a different result is in the second argument of bultins.div (divisor), but division by zero is an error at the moment.

NaN-git avatar Apr 08 '25 17:04 NaN-git

According to my analysis this can only affect serializations of a Nix float like builtins.toString ("0.000000" vs. "-0.000000")

We kind of have to assume that ends up in a derivation though. We've made that mistake, and I would not like to repeat it.

roberth avatar Apr 08 '25 20:04 roberth

On a related note: You cannot represent int64 min in nix as a literal. My guess is that the parser only parses the positive part and then negates the result by applying an operation instead of parsing the number directly:

nix-repl> -9223372036854775808  
error: invalid integer '9223372036854775808'
       at «string»:1:2:
            1| -9223372036854775808
             |  ^

nix-repl> -9223372036854775807 - 1
-9223372036854775808

oddlama avatar Apr 12 '25 09:04 oddlama

One more "interesting" case is the canonical NaN representation. On x86 it is negative, but all other platforms produce positive NaNs on invalid operations. Evaluating the following:

let
  a = 1000000000000000000000000000000000000000000000000000000000000.00000000000;
  inf = a * a * a * a * a * a;
in
{
  nan = (inf / inf);
}

On x86_64 results in:

$ nix-instantiate --eval test.nix -A nan
-nan

But for aarch64-linux:

$ qemu-aarch64 result/bin/nix-instantiate --eval test.nix -A nan
nan

xokdvium avatar Jul 24 '25 22:07 xokdvium

One more "interesting" case is the canonical NaN representation. On x86 it is negative, but all other platforms produce positive NaNs on invalid operations.

The same seems to apply to builtins.toXML.

builtins.toJSON is funny, too. It returns "null". I think that it should throw an error instead because NaN cannot be represented as JSON such that it would be round trip safe.

NaN-git avatar Jul 24 '25 23:07 NaN-git

Seems like the only way to salvage the mess that is IEEE 754 implementations is to use a softfloat library configured for a single architecture.

xokdvium avatar Jul 24 '25 23:07 xokdvium

There is no need for a soft float implementation. The only issue is that many representations of NaN exist. The representation should not matter as long as computed NaN values are serialized consistently (a special requirement of Nix), e.g. as "nan", i.e. when a builtin returns a NaN, then this value should be replaced by a specific representation.

Serializing all NaNs as "nan" is a possible solution, but this might hurt round trip safety, e.g. builtins.fromTOML can load NaN values:

nix-repl> builtins.fromTOML "a = -nan"                                                                             
{ a = -nan; }
nix-repl> builtins.fromTOML "a = nan"                                                                             
{ a = nan; }

Thus preserving the sign bit makes sense in this case.

NaN-git avatar Jul 25 '25 00:07 NaN-git

only issue is that many representations of NaN exist.

Yeah, for one we'll at least only have to deal with quiet NaNs. Preserving just the sign for those may be good. But what to do about canonical NaNs? Do we say that the canonical NaN for Nix is positive and wrap all operations on x86? That's the only way we'd be able to make eval consistent and preserve the signs.

xokdvium avatar Jul 25 '25 01:07 xokdvium

Either each float result of a computation needs to be checked (I prefer to do this on all platforms because NaN propagation is undefined anyhow, e.g. when applying a binary operator to two NaN operands) and if the result is a NaN, then a canonical NaN representation can be returned instead, i.e. a NaN with cleared sign bit. Not all NaNs would be canonical because builtins.fromTOML can create non-canonical NaNs.

Or the simplest solution: Introduce a total order to NixFloat by converting all NaNs into errors, i.e. throw an exception in mkFloat() with proper error message.

NaN-git avatar Jul 25 '25 20:07 NaN-git

Introduce a total order to NixFloat by converting all NaNs into errors

I think that ship has sailed at this point. And it's not like it's UB either, that's just implementation defined.

Canonicalising NaNs generated by arithmetic operations on the other hand sounds good. Since now it's totally scuffed and leads to platform-dependent eval it's a necessary bugfix.

The question would be: which sign do we choose to minimise the fallout? Going with the negative sign probably represents more users, since that's what x86 does. On the other hand positive canonical NaNs are more widespread (by architectures) and used by ARM, RISC-V and MIPS, probably PPC as well.

xokdvium avatar Jul 25 '25 20:07 xokdvium

Another nix bug:

(1.e200 * 1.e200 - 1.e200 * 1.e200) <= 0
(1.e200 * 1.e200 - 1.e200 * 1.e200) >= 0

both return true, but the LHS is NaN.

NaN-git avatar Sep 07 '25 16:09 NaN-git

A difference between libstdc++ and libc++ when parsing subnormal numbers with builtins.fromTOML:

No problem with libstdc++

nix-repl> builtins.fromTOML ''a = 1e-308''
{ a = 1e-308; }

but with libc++ an error is thrown

nix-repl> builtins.fromTOML ''a = 1e-308'' 
error:
       … while calling the 'fromTOML' builtin
         at «string»:1:1:
            1| builtins.fromTOML ''a = 1e-308''
             | ^

       error: while parsing TOML: [error] toml::parse_floating: failed to read floating point value from stream
        --> fromTOML
          |
        1 | a = 1e-308
          |           ^-- here

NaN-git avatar Sep 08 '25 21:09 NaN-git

A difference between libstdc++ and libc++ when parsing subnormal numbers with builtins.fromTOML

🫠🫠🫠🫠🫠🫠🫠

As if we haven't had enough TOML fun yet. This is a gift that keeps on giving

xokdvium avatar Sep 08 '25 21:09 xokdvium

both return true, but the LHS is NaN.

Quite obvious when looking at code. Since a <= b gets desugared into !(b < a) and all comparsions with NaNs return false. This is a dumpster fire actually and it's much worse than that. Nix floats are so far away from being IEEE754 compliant...

It's great that:

let
  nan = (1.e200 * 1.e200 - 1.e200 * 1.e200);
in
{
  what = nan <= nan;
  the = nan == nan;
  heck = nan > nan;
  frick = nan >= nan;
  omg = nan < nan;
}

Outputs:

{ frick = true; heck = false; omg = false; the = false; what = true; }

xokdvium avatar Sep 08 '25 21:09 xokdvium

A difference between libstdc++ and libc++ when parsing subnormal numbers with builtins.fromTOML

🫠🫠🫠🫠🫠🫠🫠

As if we haven't had enough TOML fun yet. This is a gift that keeps on giving

This can be fixed by letting forceFloat() (is this necessary?) and mkFloat() throw whenever the input is a subnormal number. Similarly, I would check for NaN and inf because inf isn't JSON round trip safe (NaN handling is broken obviously).

std::regex is funny, too. "Sadly" we couldn't find any expression yesterday, which works with both std libraries and returns different results.

NaN-git avatar Sep 08 '25 22:09 NaN-git

std::regex is funny, too. "Sadly" we couldn't find any expression yesterday, which works with both std libraries and returns different results.

Lix has a bunch of those examples actually... It's very cursed

xokdvium avatar Sep 08 '25 23:09 xokdvium

Briefly discussed during today's meeting:

Some things we could do (in arbitrary order):

  • Test current slightly cursed behaviour so it doesn't get changed accidentally.
  • Canonicalize NaNs to avoid platform specific NaN sign. (When an invalid operations as specified by IEEE 754 occurs a canonical Nix NaN is produced). Needs a decision on the sign of the canonical NaN (either negative or positive).
  • Configurable warnings/diagnostics for potentially dubious scenarios that might introduce eval impurities.

xokdvium avatar Nov 26 '25 20:11 xokdvium