libm
libm copied to clipboard
Add `fmaf16`
Split from https://github.com/rust-lang/libm/pull/390 since I think the f128 version will be trickier.
@beetrees or @tspiteri would you be able to help me figure this out? The version of fmaf16 in this PR says fma(0x0001, 0x3bf7, 0x0000) = 0x0001 (6E-8 * 0.9956 + 0 = 6E-8, same result as x * y), which agrees with what LLVM does via fmaf. However, Rug is saying the result should be 0x0000. Is this usage incorrect?
#![feature(f16)]
use az::Az;
use rug::float::Round;
fn main() {
let x = f16::from_bits(0x0001);
let y = f16::from_bits(0x3bf7);
let z = f16::from_bits(0x0000);
let mut xf = rug::Float::with_val(11, x);
let yf = rug::Float::with_val(11, y);
let zf = rug::Float::with_val(11, z);
let ordering = xf.mul_add_round(&yf, &zf, Round::Nearest);
xf.subnormalize_ieee_round(ordering, Round::Nearest);
let rug_res = (&xf).az::<f16>();
println!("rug: {xf} {rug_res:?}");
println!("std: {:?}", x.mul_add(y, z));
}
Prints:
rug: 5.9343e-8 0x0000
std: 0x0001
Also checked against Julia to get a third opinion, which agrees with 0x0001.
> x = fma(reinterpret(Float16, 0x0001), reinterpret(Float16, 0x3bf7), reinterpret(Float16, 0x0000))
Float16(6.0e-8)
> reinterpret(UInt16, x)
0x0001
It looks like what is happening there is that subnormalize{_ieee}{_round} propagate values smaller than the smallest subnormal unmodified, which is very unintuitive, but is also more or less behaving as documented. The docs of the _ieee -variants aren't explicit about what the subnormal range is, and I would have expected that to be just any exponent small enough.
I created an issue: https://gitlab.com/tspiteri/rug/-/issues/78
Great find, thanks for looking into it. This is a problem with subnormalize then right, not specifically the IEEE versions?
From a quick check it seems like this could be worked around by reducing and reexpanding the precision, which shouldn't reallocate. Any idea if there is a better way?
let mut xf = rug::Float::with_val(24, f32::from_bits(1));
xf *= 0.75;
subnormalize(&mut xf);
xf *= 16.0;
subnormalize(&mut xf);
assert_eq!(xf.to_f32(), f32::from_bits(1) * 0.75 * 16.0);
fn subnormalize(xf: &mut rug::Float) {
let e = xf.get_exp().unwrap();
if e < -126 {
xf.set_prec(24_u32.saturating_sub(-e as u32 - 126));
xf.set_prec(24);
}
}
The subnormalize_round was needed in the first place because it avoids double rounding by taking as input the direction that the Float was previously rounded. Any workaround will need to handle that as well. I'll add a specific test case to the rug issue.
Yes, all the variants of subnormalize have the same quirk, but the general ones (without _ieee) document it more explicitly.
After the change mentioned in Rug issue 78 which is now in the master branch, the code above prints:
rug: 5.9605e-8 0x0001
std: 0x0001
Thank you for the quick fix, it seems to work great! The failure in the extensive tests looks like a real failure (interestingly, Julia returns the same result though std's implementation returns the MPFR result).
---- mp_extensive_fmaf16 ----
input: (0x8f10, 0x0488, 0x83e8) (0x8f10, 0x0488, 0x83e8)
expected: 0x83e9 0x83e9
actual: 0x83e8 0x83e8
Caused by:
ulp 1 > 0
Indeed the expected result is correct. This is a case of a * b + c where:
cis subnormala * bis just slightly larger than half the least subnormala * b + c, when computed in f32, rounds to the halfway point betweenc = 0x83e8and the next representable number0x83e9
I'm not entirely sure where exactly the algorithm goes wrong, but I think it is because the test for halfway cases doesn't account for the sum being subnormal, so the f32-result (which is not subnormal) has even more excess precision.
Recreated as https://github.com/rust-lang/compiler-builtins/pull/861