simde icon indicating copy to clipboard operation
simde copied to clipboard

_mm_insert_ps is broken on ARM processor

Open MirJawadMairaj opened this issue 2 years ago • 2 comments

Hi,

I noticed that _mm_insert_ps is broken on 0.7.2 (and I noticed still on master as well).

I have a fix that worked for me looking at: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#techs=SSE4_1&ig_expand=4048

SIMDE_FUNCTION_ATTRIBUTES
simde__m128
simde_mm_insert_ps (simde__m128 a, simde__m128 b, const int imm8)
    SIMDE_REQUIRE_CONSTANT_RANGE(imm8, 0, 255)  {
  simde__m128_private
    r_,
    a_ = simde__m128_to_private(a),
    b_ = simde__m128_to_private(b);
  float tmp1_ = b_.f32[(imm8 >> 6) & 3];
  a_.f32[(imm8 >> 4) & 3] = tmp1_;
  SIMDE_VECTORIZE
  for (size_t i = 0 ; i < (sizeof(r_.f32) / sizeof(r_.f32[0])) ; i++) {
    r_.f32[i] = ((imm8 >> i) & 1 ) ? SIMDE_FLOAT32_C(0.0) : a_.f32[i];
  }
  return simde__m128_from_private(r_);
}

Please consider adding this to the new release or a similar fix.

The issue was a wrong copy and later missed masking.

For context the original is:

SIMDE_FUNCTION_ATTRIBUTES
simde__m128
simde_mm_insert_ps (simde__m128 a, simde__m128 b, const int imm8)
    SIMDE_REQUIRE_CONSTANT_RANGE(imm8, 0, 255)  {
  simde__m128_private
    r_,
    a_ = simde__m128_to_private(a),
    b_ = simde__m128_to_private(b);
  a_.f32[0] = b_.f32[(imm8 >> 6) & 3];
  a_.f32[(imm8 >> 4) & 3] = a_.f32[0];
  SIMDE_VECTORIZE
  for (size_t i = 0 ; i < (sizeof(r_.f32) / sizeof(r_.f32[0])) ; i++) {
    r_.f32[i] = (imm8 >> i) ? SIMDE_FLOAT32_C(0.0) : a_.f32[i];
  }
  return simde__m128_from_private(r_);
}

Best Regards, Jawad

MirJawadMairaj avatar Nov 03 '21 16:11 MirJawadMairaj

@MirJawadMairaj Thank you for your report and suggested fix. Can you make a pull request?

mr-c avatar Nov 03 '21 18:11 mr-c

Hi! I added here: https://github.com/simd-everywhere/simde/pull/932

Hopefully that works. I made the pull request on master.

Didn't change the unit test as I am not running the test suite. If that is a requirement than please guide how to set that up on windows.

I might have some time later next week to look into it more.

Best Regards, Jawad

MirJawadMairaj avatar Nov 04 '21 11:11 MirJawadMairaj