libm
libm copied to clipboard
Potential to add #[inline} attributes where possible
I noticed there is a PR #210 that removed most #[inline]
annotations. However, in developing one of my crates, I found that Rust's stdlib cbrtf
was a significant bottleneck. I attempted to use this crate, but it provided minimal benefit due to the functions not being inlined. The bottleneck was due to the fact that cbrtf
was being called in a loop, and could not be autovectorized because it had to be accessed via a call
instruction.
If I add the #[inline]
attribute, my function is able to inline and autovectorize the code, which resulted in a 25% performance improvement. Therefore, I would move to consider readding the #[inline]
attribute where possible. (#[inline(always)]
is not needed, only #[inline]
to allow the compiler to consider it.)
Edit: I tried doing the same with powf
for other places it's used in the crate, and in that case it had a very negative effect... so this turns out to not be a universal thing at all :thinking:
If you've found a specific case where the lack of #[inline]
causes a performance problem for you then we are happy to accept a PR to add the #[inline]
back.
I benchmarked every function with #[inline]
vs. as-is, and there are a few considerable improvements. On my machine (x86_64 Arch Linux, Ryzen 7 5800H) I got these significant results - though I'd appreciate someone on a different system/architecture running a bench as well:
-
atan
: 3 -> 2 ns/iter -
atan2
: 6 -> 5 ns/iter -
cosf
: 4 -> 2 ns/iter -
coshf
: 4 -> 3 ns/iter -
expm1f
: 3 -> 2 ns/iter -
hypot
: 4 -> 2 ns/iter -
hypotf
: 2 -> 1 ns/iter -
sinf
: 5 -> 2 ns/iter -
tan
: 4 -> 3 ns/iter -
tanf
: 7 -> 2 ns/iter -
tanh
: 8 -> 6 ns/iter
I only included results where #[inline]
helped by >10% but I will do more runs and average out results before doing any proper PR, for now just want to put this out there