libm
libm copied to clipboard
Potential to add #[inline] attributes where possible
I noticed there is a PR #210 that removed most #[inline] annotations. However, in developing one of my crates, I found that Rust's stdlib cbrtf was a significant bottleneck. I attempted to use this crate, but it provided minimal benefit due to the functions not being inlined. The bottleneck was due to the fact that cbrtf was being called in a loop, and could not be autovectorized because it had to be accessed via a call instruction.
If I add the #[inline] attribute, my function is able to inline and autovectorize the code, which resulted in a 25% performance improvement. Therefore, I would move to consider readding the #[inline] attribute where possible. (#[inline(always)] is not needed, only #[inline] to allow the compiler to consider it.)
Edit: I tried doing the same with powf for other places it's used in the crate, and in that case it had a very negative effect... so this turns out to not be a universal thing at all :thinking:
If you've found a specific case where the lack of #[inline] causes a performance problem for you then we are happy to accept a PR to add the #[inline] back.
I benchmarked every function with #[inline] vs. as-is, and there are a few considerable improvements. On my machine (x86_64 Arch Linux, Ryzen 7 5800H) I got these significant results - though I'd appreciate someone on a different system/architecture running a bench as well:
atan: 3 -> 2 ns/iteratan2: 6 -> 5 ns/itercosf: 4 -> 2 ns/itercoshf: 4 -> 3 ns/iterexpm1f: 3 -> 2 ns/iterhypot: 4 -> 2 ns/iterhypotf: 2 -> 1 ns/itersinf: 5 -> 2 ns/itertan: 4 -> 3 ns/itertanf: 7 -> 2 ns/itertanh: 8 -> 6 ns/iter
I only included results where #[inline] helped by >10% but I will do more runs and average out results before doing any proper PR, for now just want to put this out there