simde icon indicating copy to clipboard operation
simde copied to clipboard

x86 and neon max/min are not the same

Open moon-chilled opened this issue 1 year ago • 9 comments

Currently, _mm_max_pd is implemented in terms of vmaxq_f64 on neon. But their behaviour is not quite the same. When handed zeroes of opposite signs, the latter will return positive zero, but the former will return its second argument. The situation with min is analogous.

moon-chilled avatar Jul 26 '22 07:07 moon-chilled

Thank you @moon-chilled for your report. Can you send a PR to fix this? Or suggest test cases that would make the problem clear?

mr-c avatar Jan 18 '23 17:01 mr-c

_mm_max_pd(0,-0) = -0

vmaxq_f64(0,-0) = 0

moon-chilled avatar Jan 21 '23 23:01 moon-chilled

Thank @moon-chilled for the test cases; I don't know NEON well enough to fix this. A PR or suggestion from anyone is very welcome

mr-c avatar Jan 22 '23 13:01 mr-c

solutions from sse2neon for __aarch64__:

_mm_max_pd: r_.neon_f64 = vbslq_f64(vcgtq_f64(a_.neon_f64, b_.neon_f64), a_.neon_f64, b_.neon_f64);

_mm_min_pd: r_.neon_f64 = vbslq_f64(vcltq_f64(a_.neon_f64, b_.neon_f64), a_.neon_f64, b_.neon_f64);

_mm_min_sd: return simde_mm_move_sd(a, simde_mm_min_pd(a, b));

_mm_max_sd: return simde_mm_move_sd(a, simde_mm_max_pd(a, b));

solutions from sse2neon for ARMv7-A:

_mm_max_ps is already correct.

_mm_max_ss: simde_mm_move_ss(a, simde_mm_max_ps(a, b));

_mm_min_ps is missing a case: r_.neon_f32 = vbslq_f32(vcltq_f32(a_.neon_f32, b_.neon_f32), a_.neon_f32, b_.neon_f32);

_mm_min_ss: return simde_mm_move_ss(a, simde_mm_min_ps(a, b));

I wonder about other targets WASM, POWER, etc.

aqrit avatar Jan 22 '23 15:01 aqrit

Thank you @aqrit ; do you have time to send a PR with the test case in https://github.com/simd-everywhere/simde/issues/963#issuecomment-1399357212 along with those fixes?

mr-c avatar Jan 22 '23 16:01 mr-c

yeah, I'll put it together tonight.

aqrit avatar Jan 22 '23 17:01 aqrit

_mm_min_ps & _mm_min_ss seem to have different behavior. So I'll have to put this off till tomorrow night.

aqrit avatar Jan 23 '23 04:01 aqrit

_mm_min_ps & _mm_min_ss seem to have different behavior.

GCC bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99497

aqrit avatar Jan 23 '23 23:01 aqrit

_mm_min_ps & _mm_min_ss seem to have different behavior.

GCC bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99497

Yuck, what a mess. Maybe we should adjust and have two sets of functions: one set always follows the Intel intrinsic spec , and the other set always follows the GCC behavior.

mr-c avatar Jan 24 '23 11:01 mr-c