ARM neon - vfmapn_vf_vf_vf_vf
Hi Shibata-san,
My colleague Nikita Astafev has a fix for the sleef intrinsic vfmapn_vf_vf_vf_vf to helperadvsimd.h. The change is necessary when the result is zero:
diff --git a/src/arch/helperadvsimd.h b/src/arch/helperadvsimd.h
index 582ebfd..e626b51 100644
--- a/src/arch/helperadvsimd.h
+++ b/src/arch/helperadvsimd.h
@@ -193,7 +193,7 @@ static INLINE VECTOR_CC vfloat vfmanp_vf_vf_vf_vf(vfloat x, vfloat y, vfloat z)
}
static INLINE VECTOR_CC vfloat vfmapn_vf_vf_vf_vf(vfloat x, vfloat y, vfloat z) { // x * y - z
- return vneg_vf_vf(vfmanp_vf_vf_vf_vf(x, y, z));
+ return vfma_vf_vf_vf_vf(x, y, vneg_vf_vf(z));
}
// Reciprocal 1/x, Division, Square root
@@ -405,7 +405,7 @@ static INLINE VECTOR_CC vdouble vfmanp_vd_vd_vd_vd(vdouble x, vdouble y, vdouble
}
static INLINE VECTOR_CC vdouble vfmapn_vd_vd_vd_vd(vdouble x, vdouble y, vdouble z) { // x * y - z
- return vneg_vd_vd(vfmanp_vd_vd_vd_vd(x, y, z));
+ return vfma_vd_vd_vd_vd(x, y, vneg_vd_vd(z));
}
// Reciprocal 1/x, Division, Square root
Okay, but are you adding new functions based on the helpers? I am interested in what functions people like you want to add to sleef.
Hi Shibata-san,
Our situation is a little more complicated than just using your SLEEF routines. We have adopted your intrinsic notation and accompanying header files for all our new CPU math intrinsics. As you know we support our x86_64, ppc64le, and aarch64 Linux HPC SDK along with soon to be released x86_64 Windows HPC SDK. Abstracting out the architecture dependencies in writing our new routines has greatly improved development time and subsequent support.
For an example of what we're doing, see: https://github.com/flang-compiler/flang/tree/master/runtime/libpgmath/lib/common/log10
Why don't you just use the log10 functions in SLEEF? I haven't carefully looked into your implementation, but is it faster?
Please also check out the CUDA support which is recently added to SLEEF. https://github.com/shibatch/sleef/pull/337