simde
simde copied to clipboard
_mm_testc_si128 and _mm_testnzc_si128 faulty for arm
Hello,
I implement intrinsincs in D and reference simd-everywhere a lot for arm versions of the intrinsics. Thanks @nemequ for that! It is tremendously helpful.
I believe the SIMDE_ARM_NEON_A32V7_NATIVE
versions of simde_mm_testc_si128
and _mm_testnzc_si128
are faulty in simd-everywhere.
About _mm_testc_si128
This intrinsic tests if all bits masked by b
are 1's in a
.
Implementation is:
int64x2_t s64 = vbicq_s64(b_.neon_i64, a_.neon_i64);
return !(vgetq_lane_s64(s64, 0) & vgetq_lane_s64(s64, 1));
I suggest the following fix:
int64x2_t s64 = vbicq_s64(b_.neon_i64, a_.neon_i64);
return !(vgetq_lane_s64(s64, 0) | vgetq_lane_s64(s64, 1));
with a |
instead of &
.
Indeed spec says: "Compute the bitwise NOT of a and then AND with b, and set CF to 1 if the result is zero, otherwise set CF to 0. Return the CF value."
The unittest that did reveal this bug in D is:
unittest
{
__m128i A = _mm_setr_epi32(0x01, 0x02, 0x04, 0xf8);
__m128i M1 = _mm_setr_epi32(0xfe, 0xfd, 0x00, 0x00);
__m128i M2 = _mm_setr_epi32(0x00, 0x00, 0x04, 0x00);
assert(_mm_testc_si128(A, A) == 1);
assert(_mm_testc_si128(A, M1) == 0);
assert(_mm_testc_si128(A, M2) == 1);
}
In my testing this makes the intrinsic agree with the instruction.
About the barely useful _mm_testnzc_si128
(not 100% sure about this one)
This intrinsic tests if all bits masked by b
have both 0's and 1's in a
, but are not only 0's or only 1's.
A similar problem exist with _mm_testnzc_si128
arm version.
Implementation is:
int64x2_t s640 = vandq_s64(b_.neon_i64, a_.neon_i64);
int64x2_t s641 = vbicq_s64(b_.neon_i64, a_.neon_i64);
return (((vgetq_lane_s64(s640, 0) | vgetq_lane_s64(s640, 1)) & (vgetq_lane_s64(s641, 0) | vgetq_lane_s64(s641, 1)))!=0);
The guide says (warning headache):
Compute the bitwise AND of 128 bits (representing integer data) in a and b, and set ZF to 1 if the result is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, and set CF to 1 if the result is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.
I suggest the following fix (D version):
long2 s640 = vandq_s64(cast(long2)b, cast(long2)a);
long2 s641 = vbicq_s64(cast(long2)b, cast(long2)a);
return !( !(vgetq_lane_s64(s641, 0) | vgetq_lane_s64(s641, 1))
| !(vgetq_lane_s64(s640, 0) | vgetq_lane_s64(s640, 1)) );
The unittest that revealed the bug for me was:
unittest
{
__m128i A = _mm_setr_epi32(0x01, 0x02, 0x04, 0xf8);
__m128i M = _mm_setr_epi32(0x01, 0x40, 0x00, 0x00);
__m128i Z = _mm_setzero_si128();
assert(_mm_testnzc_si128(A, Z) == 0);
assert(_mm_testnzc_si128(A, M) == 1);
assert(_mm_testnzc_si128(A, A) == 0);
}
In my testing this makes the intrinsic agree with the instruction on that unittest.