libm
libm copied to clipboard
Add assembly version of simple operations on aarch64
For aarch64 and arm64ec with Neon, add assembly versions of the following:
ceilceilffabsfabsffloorfloorffmafmafroundroundfsqrtsqrtftrunctruncf
If the fp16 target feature is available, which implies neon, also include the following:
ceilf16fabsf16floorf16rintf16sqrtf16truncf16
Additionally, replace core::arch versions of the following with handwritten assembly (which avoids issues with aarch64be):
rintrintf
Instructions for fmax and fmin are also available but seem to provide different results based on whether NaN inputs are signaling or quiet. Our current implementation does not do this, so omit these for now.
@Amanieu would you mind double checking the assembly in src/math/arch/aarch64.rs? I am unsure whether preserves_flags should be set, I believe some of these operations may set flags based on the exception control register.
Cc @hanna-kruppe, while I was working on the others I also replaced the rint vector implementation.
However I'm then questioning how useful these are on hard-float targets: the standard library will invoke the LLVM intrinsic which will lower to the instruction, so the libm function will never be called. If this is only for compiler-builtins then it might be better to keep libm soft-float only.
At least some of these are used internally within libm by functions that still need to exist on hard-float targets. For example, floor is used by rem_pio2_large which is needed by many trigonometric functions.
(Plus the benefits for non-compiler-builtins consumers, who are not the main point of this crate but it’s still nice-to-have.)
For the operations that are used internally, the ideal end state that we want is for libm to use the float methods from core, which will then be lowered by LLVM to the appropriate instructions.
Is there any harm in taking the improvement now and revisiting once those methods are actually available in core?
My only motivation here is fma - some of the incoming CORE-math routines rely on it, I wanted to have a more accurate icount comparison without soft fma before mul_add is available in core. Nothing else is important, I just included the other simple ops since they are reasonably trivial.
I dropped most of this change but kept:
rintbecause the SIMD calls are preexisting, this is causing issues with cg_gccsqrtandfmabecause they are used for a lot of other routines. This is mostly for direct users oflibmuntil math incoreis stable.