HIP
HIP copied to clipboard
question about basic rounded operations
Can you please explain how the macro OCML_BASIC_ROUNDED_OPERATIONS is defined ?
error: use of undeclared identifier '__fsqrt_rz'; did you mean '__fsqrt_rn'? return __fsqrt_rz(f); ^~~~~~~~~~ __fsqrt_rn
In addition, are there HIP/ROCm functions corresponding to the following CUDA assembly instructions ?
// optimized version of DP rsqrt(a) provided by Norbert Juffa device double fast_rsqrt(double a) { double x, e, t; float f; asm ("cvt.rn.f32.f64 %0, %1;" : "=f"(f) : "d"(a)); asm ("rsqrt.approx.ftz.f32 %0, %1;" : "=f"(f) : "f"(f)); asm ("cvt.f64.f32 %0, %1;" : "=d"(x) : "f"(f)); t = __dmul_rn (x, x); e = __fma_rn (a, -t, 1.0); t = __fma_rn (0.375, e, 0.5); e = __dmul_rn (e, x); x = __fma_rn (t, e, x); return x; }
Thank you