rust icon indicating copy to clipboard operation
rust copied to clipboard

f*::min/max do not behave as documented for signaling NaN on aarch64, or with optimizations

Open RalfJung opened this issue 1 month ago • 9 comments

Our documentation says that max(SNaN, 0.0) should return 0.0. On x86_64, this is indeed what happens. However, on aarch64, we get a QNaN as a result instead. On Android, the reported result for this test is:

F32::from(F32::nan(Sign::Pos, NaNKind::Signaling, 1).as_f32().max(0.0)) = 0x7fc00001 (NaN: Pos, Quiet, payload = 0x1)

It seems that LLVM never actually correctly implemented the promise that "if exactly one argument is NaN, the other argument is returned". Recently things got a lot more messy (see https://github.com/llvm/llvm-project/issues/170082 and also this discussion), but apparently things were already broken before this recent chaos.

Also, since Rust 1.91 optimizations seem to fold max(SNaN, x) into NaN.

Cc @tgross35 @valadaptive @nikic @workingjubilee @rust-lang/libs-api

RalfJung avatar Dec 02 '25 08:12 RalfJung

should we just stop trying to make these poorly-conceived intrinsics work and just use minimum and maximum instead?

workingjubilee avatar Dec 02 '25 18:12 workingjubilee

should we just stop trying to make these poorly-conceived intrinsics work and just use minimum and maximum instead?

Do you mean {min,max}imumnum? I think that's the ideal case, but buggy LLVM optimizations and potential lost performance are holding it back. (I expect that the result of the LLVM discourse discussion will be that we will know exactly which intrinsic to use for our desired behavior, then anything not matching it will be an LLVM bug.)

tgross35 avatar Dec 02 '25 21:12 tgross35

{min,max}imumnum will never be very good on x86 because they require ordering -0.0 below +0.0 and x86 does not have an instruction for that.

My understanding is that we can get the NaN behavior of {min,max}imumnum (that's what we already document), but not the signed-zero behavior.

RalfJung avatar Dec 02 '25 21:12 RalfJung

I personally like the idea of having min/max do whatever is fastest on the current target, which essentially means that:

  • if one input is NaN then we non-deterministically return either an arbitrary NaN or the other input value.
  • if the inputs are two zeros of different signs then we non-deterministically return either 0.0 or -0.0.

This works well for the vast majority of cases where people don't care about NaN or signed zeros. We could then have specialized functions which guarantee one of the 3 IEEE 754 behaviors:

  • minimum/maximum
    • if any input is NaN, return NaN
  • minimum_number/maximum_number (2019)
    • if any input is NaN, return the other, or an arbitrary NaN if both are NaN
  • min_num/max_num (2008)
    • if any input is sNaN, return qNaN
    • otherwise: if any input is NaN, return the other, or an arbitrary NaN if both are NaN

The downsides are that this would be a breaking change and that it would be hard to explain the differences between these similarly-named functions to users.

Amanieu avatar Dec 03 '25 02:12 Amanieu

{min,max}imumnum will never be very good on x86 because they require ordering -0.0 below +0.0 and x86 does not have an instruction for that.

The current lowering in LLVM is pretty bad, especially on the default x86-64(-v1) microarchitecture level, since it uses "select" operations and LLVM really wants to lower them to branches. https://github.com/llvm/llvm-project/pull/170069 improves this a fair bit, and makes the operations easily vectorizable. I believe WebAssembly engines implement a similar lowering, although last I checked, it's not identical.

I surveyed the range of behaviors and available hardware operations when working on a SIMD library (https://github.com/linebender/fearless_simd/issues/133), although keep in mind that this was before I discovered all the LLVM weirdness and the sNaN behavior divergence. There are a bajillion different sets of semantics here, and none of them are optimal in all cases.

valadaptive avatar Dec 03 '25 03:12 valadaptive

if any input is sNaN, return qNaN

Note that LLVM generally reserves the right to treat an sNaN like a qNaN, so for any function that distinguishes between sNaN and qNaN, an sNaN input can always non-deterministically also generate the qNaN behavior. Currently, we have only very few such functions -- specifically, powf and powi. I'd rather avoid adding more, so I don't think we should have any operation that has the 2008 NaN behavior. (It's also non-associative to make matters worse.)

RalfJung avatar Dec 03 '25 07:12 RalfJung

Seems like with optimizations, we also get the wrong behavior on x86: Playground

[src/main.rs:9:5] max(f32::NAN, 0.0) = 0.0
[src/main.rs:10:5] max(0.0, f32::NAN) = 0.0
[src/main.rs:11:5] max(F32_SNAN, 0.0) = NaN
[src/main.rs:12:5] max(0.0, F32_SNAN) = NaN

If I disable inlining for max, then the NaN results turn into 0.0.

RalfJung avatar Dec 06 '25 08:12 RalfJung

The x86 behavior seems to have regressed between Rust 1.90 and Rust 1.91 -- @nikic does that correspond to an LLVM update?

RalfJung avatar Dec 06 '25 08:12 RalfJung

Yes, Rust 1.91 is LLVM 21.

nikic avatar Dec 06 '25 15:12 nikic

So it seems like the way forward is to emit minimumnum + nsz instead of minnum. What is currently preventing us from doing so?

  • Do we get the intended codegen on x86?
  • I assume this lowers to fminimum_num libcalls if there's no specific support in the backend; is that a problem in terms of libm support?

RalfJung avatar Dec 12 '25 14:12 RalfJung

@nikic is it worth backporting https://github.com/llvm/llvm-project/pull/170181 to our LLVM fork to fix the x86 regression?

RalfJung avatar Dec 12 '25 15:12 RalfJung

So it seems like the way forward is to emit minimumnum + nsz instead of minnum. What is currently preventing us from doing so?

I believe minimumnum has some legalization failures that were only fixed in LLVM 22. So I think we'll only want to switch after the next LLVM update.

I assume this lowers to fminimum_num libcalls if there's no specific support in the backend; is that a problem in terms of libm support?

I think we'll want to provide a fminimum_num implementation in compiler-builtins. Relying on it being present in libm would indeed be an issue.

@nikic is it worth backporting https://github.com/llvm/llvm-project/pull/170181 to our LLVM fork to fix the x86 part of this regression?

As it would still be broken on other arches (including mainstream ones like AArch64), I'd say probably not worth it?

nikic avatar Dec 12 '25 15:12 nikic

I think we'll want to provide a fminimum_num implementation in compiler-builtins. Relying on it being present in libm would indeed be an issue.

Cc @tgross35 about whether that's feasible

As it would still be broken on other arches (including mainstream ones like AArch64), I'd say probably not worth it?

Fixing it for one mainstream arch (x86) is better than nothing IMO. In particular, for x86 this is a regression, for AArch64 it is not.

RalfJung avatar Dec 12 '25 15:12 RalfJung

@Amanieu

I personally like the idea of having min/max do whatever is fastest on the current target, which essentially means that:

  • if one input is NaN then we non-deterministically return either an arbitrary NaN or the other input value.
  • if the inputs are two zeros of different signs then we non-deterministically return either 0.0 or -0.0.

That's not quite it, I think. At least, if we look at what is being proposed for LLVM, the least constrained option is minnum + nsz which amounts to

  • if one input is SNaN then we non-deterministically return either an arbitrary NaN or the other input value.
  • if one input is QNaN then we return the other input value.
  • if the inputs are two zeros of different signs then we non-deterministically return either +0.0 or -0.0.

This is exactly the semantics of fmin in C (taking into account that C generally doesn't talk about signaling NaNs).

Compared to what we have documented, this is a breaking change "just" for SNaN inputs. This is also the semantics that we already (accidentally) use on aarch64, that we sometimes now get on x86 if optimizations kick in, and that we had for several years on x86-glibc if the libcall fallback is used (which seems to be the case with -Copt-level=z).

Also note that the 2008 minNum semantics (if any input is sNaN, return qNaN) are not representable in LLVM without using constrained floating point intrinsics, because LLVM generally has the license to treat SNaN inputs as-if they were QNaN, and the license to return SNaN when generating a NaN output of an operation with an SNaN input.

RalfJung avatar Dec 12 '25 15:12 RalfJung