simde icon indicating copy to clipboard operation
simde copied to clipboard

_mm_rsqrt_ss not matching simde_mm_rsqrt_ss fail

Open YileKu opened this issue 1 year ago • 8 comments

A0: 00 00 40 40 00 00 00 00 00 00 00 00 00 00 00 00 B0: 00 00 80 3F 00 00 00 00 00 00 00 00 00 00 00 00

  auto mul_A0 = _mm_mul_ss(A0,A0);
   auto mul_B0 = _mm_mul_ss(B0,B0);
   auto add_ss = _mm_add_ss(mul_A0, mul_B0 );

mul_a0: 00 00 10 41 00 00 00 00 00 00 00 00 00 00 00 00 mul_b0: 00 00 80 3F 00 00 00 00 00 00 00 00 00 00 00 00 add_ss: 00 00 20 41 00 00 00 00 00 00 00 00 00 00 00 00

   auto root = _mm_rsqrt_ss( add_ss );

root: 00 E0 A1 2E 00 00 00 00 00 00 00 00 00 00 00 00

On a Cortex-A72 using simde_mm_rsqrt_ss:

A0: 00 00 40 40 00 00 00 00 00 00 00 00 00 00 00 00 B0: 00 00 80 3F 00 00 00 00 00 00 00 00 00 00 00 00 add_ss: 00 00 20 41 00 00 00 00 00 00 00 00 00 00 00 00

root: 00 80 A1 3E 00 00 00 00 00 00 00 00 00 00 00 00

YileKu avatar Sep 15 '24 23:09 YileKu

This code gives different results when run on intel and with simd-everywhere headers on cortex-a72

void ldump_debug (char *t, void *_d, int len) { fprintf(stdout,"%s: ",t); unsigned char *cp = (unsigned char *)_d; for (int i= 0; i<len; i++, cp++) fprintf(stdout,"%02X ", *cp ); fprintf(stdout,"\n"); }

__m128 t = { 0x00002041, 00, 00, 00 } ; auto out = _mm_rsqrt_ss(t); ldump_debug("LOCAL", &out, sizeof(out));

On Cortex-a72: LOCAL: 00 00 34 3C 00 00 00 00 00 00 00 00 00 00 00 00 On Intel : LOCAL: 00 48 34 3C 00 00 00 00 00.....

YileKu avatar Sep 16 '24 00:09 YileKu

Hello @YileKu and thank you for your report

Did you try compiling with -DSIMDE_ACCURACY_PREFERENCE=2, or adding #define SIMDE_ACCURACY_PREFERENCE 2 before including the SIMDe header in your application?

mr-c avatar Sep 17 '24 12:09 mr-c

I will try that thank you.

On Tue, Sep 17, 2024 at 6:48 AM Michael R. Crusoe @.***> wrote:

Hello @YileKu https://github.com/YileKu and thank you for your report

Did you try compiling with -DSIMDE_ACCURACY_PREFERENCE=2, or adding #define SIMDE_ACCURACY_PREFERENCE 2 before including the SIMDe header in your application?

— Reply to this email directly, view it on GitHub https://github.com/simd-everywhere/simde/issues/1222#issuecomment-2355662340, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWCEAKYNC7FJ7BZZ4X3OTZXAQIRAVCNFSM6AAAAABOIHS22WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJVGY3DEMZUGA . You are receiving this because you were mentioned.Message ID: @.***>

YileKu avatar Sep 18 '24 16:09 YileKu

So isn’t the precision implicit in the API? Are there other AVX apis that need a clarification when being mapped to NEON?

On Wed, Sep 18, 2024 at 10:01 AM Yile Ku @.***> wrote:

I will try that thank you.

On Tue, Sep 17, 2024 at 6:48 AM Michael R. Crusoe < @.***> wrote:

Hello @YileKu https://github.com/YileKu and thank you for your report

Did you try compiling with -DSIMDE_ACCURACY_PREFERENCE=2, or adding #define SIMDE_ACCURACY_PREFERENCE 2 before including the SIMDe header in your application?

— Reply to this email directly, view it on GitHub https://github.com/simd-everywhere/simde/issues/1222#issuecomment-2355662340, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWCEAKYNC7FJ7BZZ4X3OTZXAQIRAVCNFSM6AAAAABOIHS22WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJVGY3DEMZUGA . You are receiving this because you were mentioned.Message ID: @.***>

YileKu avatar Sep 18 '24 20:09 YileKu

So isn’t the precision implicit in the API? Are there other AVX apis that need a clarification when being mapped to NEON?

That's a good question. I didn't write this code. I think https://github.com/simd-everywhere/simde?tab=readme-ov-file#caveats should be updated with this information

mr-c avatar Sep 18 '24 20:09 mr-c

Tried with the #define above and it still didn't work.

YileKu avatar Sep 19 '24 15:09 YileKu

The rsqrt instructions are interesting. They're not actually specified to require bit-accurate implementations, but are instead specified as being mathematically accurate to a given precision. See the Intel API docs:

The maximum relative error for this approximation is less than 1.5*2^-12.

The instructions aren't even bit-compatible across CPU manufacturers… Intel and AMD return different values.

I'm not saying the implementation is perfect, only that bit-accurate results are not expected. It's possible some implementations have a higher maximum relative error than specified, but they should be pretty comparable, at least with a higher accuracy preference selected.

nemequ avatar Sep 26 '24 14:09 nemequ

Thanks for the explanation.

On Thu, Sep 26, 2024 at 8:05 AM Evan Nemerson @.***> wrote:

The rsqrt instructions are interesting. They're not actually specified to require bit-accurate implementations, but are instead specified as being mathematically accurate to a given precision. See the Intel API docs https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_rsqrt_ss&ig_expand=5647 :

The maximum relative error for this approximation is less than 1.5*2^-12.

The instructions aren't even bit-compatible across CPU manufacturers… Intel and AMD return different values https://robert.ocallahan.org/2021/09/rr-trace-portability-diverging-behavior.html .

I'm not saying the implementation is perfect, only that bit-accurate results are not expected. It's possible some implementations have a higher maximum relative error than specified, but they should be pretty comparable, at least with a higher accuracy preference selected.

— Reply to this email directly, view it on GitHub https://github.com/simd-everywhere/simde/issues/1222#issuecomment-2377075560, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWCECZTWLKEGZMGKH2JJDZYQIDBAVCNFSM6AAAAABOIHS22WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZXGA3TKNJWGA . You are receiving this because you were mentioned.Message ID: @.***>

YileKu avatar Sep 26 '24 15:09 YileKu