compiler-builtins
compiler-builtins copied to clipboard
Non-musl `f32` functions?
Are you interested in non-musl implementations, especially f32 functions faster on modern architectures? I am writing math functions in Rust from scratch, based on my own research since 2017:
https://github.com/jdh8/metallic-rs
My functions are tested to be faithfully rounded (error < 1 ulp). Unlike most C libraries, I assume that f64 instructions are available even for f32 functions and also try to use FMA if possible. I also keep LUTs to a minimum with WebAssembly in mind.
Benchmarks show that my implementations are faster than musl (libm) and glibc (std on Linux) except for libm::logf
https://github.com/jdh8/metallic-rs/actions/runs/10342894582/job/28626374065
@jdh8 do you have any specific examples of which specific functions you are talking about? Supplying math symbols that are not exposed in std is more or less a non-goal of our libm.
I have published metallic, so its docs are publicly available.
https://docs.rs/metallic/latest/metallic/index.html
For now I have implemented cbrt, exp, exp2, exp10, exp_m1, frexp, hypot, ldexp, ln, ln_1p, log2, log10, round for f32.
Functions not yet in std
exp10
f32::exp10/metallic::f32::exp10
time: [1.4450 ns 1.4468 ns 1.4491 ns]
f32::exp10/libm::exp10f time: [8.7490 ns 8.7607 ns 8.7763 ns]
frexp
f32::frexp/metallic::f32::frexp
time: [522.51 ps 523.83 ps 525.21 ps]
f32::frexp/libm::frexpf time: [2.1766 ns 2.1782 ns 2.1802 ns]
ldexp
f32::ldexp/metallic::f32::ldexp
time: [731.73 ps 732.56 ps 733.40 ps]
f32::ldexp/libm::ldexpf time: [8.5147 ns 8.5303 ns 8.5475 ns]
Examples of functions already in std
cbrt
f32::cbrt/metallic::f32::cbrt
time: [1.2175 ns 1.2204 ns 1.2240 ns]
f32::cbrt/libm::cbrtf time: [1.5684 ns 1.5699 ns 1.5717 ns]
f32::cbrt/f32::cbrt time: [15.171 ns 15.177 ns 15.184 ns]
exp
f32::exp/metallic::f32::exp
time: [1.4475 ns 1.4538 ns 1.4635 ns]
f32::exp/libm::expf time: [9.4960 ns 9.5048 ns 9.5178 ns]
f32::exp/f32::exp time: [4.2191 ns 4.2225 ns 4.2269 ns]
hypot
f32::hypot/metallic::f32::hypot
time: [770.30 ps 771.21 ps 772.27 ps]
f32::hypot/libm::hypotf time: [3.4183 ns 3.4193 ns 3.4204 ns]
f32::hypot/f32::hypot time: [3.7229 ns 3.7271 ns 3.7328 ns]
log2
f32::log2/metallic::f32::log2
time: [1.6325 ns 1.6335 ns 1.6347 ns]
f32::log2/libm::log2f time: [5.7100 ns 5.7170 ns 5.7267 ns]
f32::log2/f32::log2 time: [3.2606 ns 3.2647 ns 3.2721 ns]
Is your crate no_std compatible?
I tried but not yet. I use functions from std to access FMA and rounding instructions.
f32::mul_add:
https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/src/f32/mod.rs#L64
f32::round_ties_even:
https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/src/f32/mod.rs#L204
f64::mul_add:
https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/src/f64/mod.rs#L16
f64::round_ties_even:
https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/src/f32/mod.rs#L182
https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/src/f32/mod.rs#L239
https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/src/f32/mod.rs#L275
Which platform are you doing the performance comparison on? Looking at the performance numbers, you've math builtins don't appear to be coming from Rust libm (and are presumably coming from the system libc); it would be good to know which libc your system is using for comparison purposes.
You've stated that the accuracy of all the functions is error < 1 ULP; is this just an upper bound or can this amount of error actually occur for each functions? Using this paper it's possible to compare to the accuracy of the MUSL versions of cbrtf, expf, hypotf, log2f and exp10f (note that Rust libm's expf and log2f implementations are out-of-date in comparison to MUSL upstream); notably all MUSL's functions except for exp10f have a maximum error smaller than 1 ULP (with the caveat that it's impossible to exhaustively test the bivariate hypotf; see the paper for details), with cbrtf being correctly rounded (error <= 0.5 ULP) in all cases.
I ask because both the CORE-MATH project and LLVM libc both provide correctly rounded (error <= 0.5 ULP) implementations of the functions that have competitive performance (a performance comparison with glibc is displayed on the CORE-MATH project website). All else being equal, it would be good to move towards Rust libm having correctly-rounded implementations where possible.
I'm using ubuntu-latest, the GitHub Actions runner image.
https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/.github/workflows/rust.yml#L23
I just tested if the errors are < 1 ulp. I have not yet computed the maximum error for each function. https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/tests/f32/main.rs#L36
Wow, correctly rounded transcendental functions exist! I dare not dream about this because of the Table Maker's Dilemma.
Re. the correctly rounded transcendental functions and the table maker's dilemma: this paper goes in to how the dilemma is dealt with. (For single-argument f32 functions in particular, it's possible to do an exhaustive computation using every possible input.)
True, exhaustive computation for univariate f32 functions is pretty feasible as I've already tested them exhaustively with their f64 counterparts.
If correct rounding is a goal, I don't think I can make anything better than CORE-MATH. Maybe we should port CORE-MATH or LLVM libc instead? I've read several math functions from CORE-MATH. These functions are already using my assumptions such as calling f64 instructions in f32 functions. (C has floating-point contraction so it does not need special care for FMA.)
I've also been thinking of porting from CORE-MATH: the implementations seem to be self contained (making it easier to port) and well tested against the GNU MPFR library. FYI per https://github.com/rust-lang/rust/issues/128386#issuecomment-2282249890 I believe @alexpyattaev is currently looking into porting a new implementation of f64 log.
For note, Rust has good wrappers for MPFR. See rug
I gain my confidence back! There is still room for optimization because we can ignore the floating-point environment. I'm upgrading my library to correct rounding.
I also have a project that's semi-related to this issue: I have developed an implementation of the Lambert W function: https://crates.io/crates/lambert_w. Would integrating this capability be interesting? It is no_std and has 50 bits of accuracy according to the paper it is based on. It currently only has support for f64, hence "semi-related".
Thanks, but as far as I know Lambert's W is not in standard library, and this issue is about functions in std. Whether it belongs in std or not is a separate question entirely.