XNNPACK
XNNPACK copied to clipboard
Why is Signal 7 reporting an error on the armv7a platform TEST (F16_VCMUL_NEONFP16ARITH_U8, batch_lt_8) ?
Hello, currently in armv7a platform test case f16-vcmul-test, some test functions may crash with Signal 7 error, but I am not sure if it is a bug. Please help me. TEST(F16_VCMUL__NEONFP16ARITH_U8, batch_lt_8) TEST(F16_VCMUL__NEONFP16ARITH_U8, batch_gt_8) TEST(F16_VCMUL__NEONFP16ARITH_U8, inplace_a) TEST(F16_VCMUL__NEONFP16ARITH_U8, inplace_b) TEST(F16_VCMUL__NEONFP16ARITH_U8, inplace_a_and_b) TEST(F16_VCMUL__NEONFP16ARITH_U16, batch_lt_16) TEST(F16_VCMUL__NEONFP16ARITH_U16, batch_gt_16) TEST(F16_VCMUL__NEONFP16ARITH_U16, inplace_a) TEST(F16_VCMUL__NEONFP16ARITH_U16, inplace_b) TEST(F16_VCMUL__NEONFP16ARITH_U16, inplace_a_and_b) TEST(F16_VCMUL__NEONFP16ARITH_U32, batch_lt_32) TEST(F16_VCMUL__NEONFP16ARITH_U32, batch_gt_32) TEST(F16_VCMUL__NEONFP16ARITH_U32, inplace_a) TEST(F16_VCMUL__NEONFP16ARITH_U32, inplace_b) TEST(F16_VCMUL__NEONFP16ARITH_U32, inplace_a_and_b)
Log analysis: In the source code of xnnpack, the void xnn_f16_vcmul_ukernel_neonfp16Arith_u8 function requires a 4-byte aligned address (memory address should be a multiple of 4) when converting the (uint16_t *) type address to (uint32_t *) type. Otherwise, there may be memory address misaligned access issues, leading to a crash and immediate exit with a Signal 7 error.
1、Related source file path: XNNPACK/src/f16-vcmul/gen/f16-vcmul-neonfp16arith-u8.c XNNPACK/src/f16-vcmul/gen/f16-vcmul-neonfp16arith-u16.c XNNPACK/src/f16-vcmul/gen/f16-vcmul-neonfp16arith-u32.c
2、Declaration of failed function: void xnn_f16_vcmul_ukernel__neonfp16arith_u8( size_t batch, const void* input_a, const void* input_b, void* output, const union xnn_f16_default_params params[restrict XNN_MIN_ELEMENTS(1)]) XNN_OOB_READS
void xnn_f16_vcmul_ukernel__neonfp16arith_u16( size_t batch, const void* input_a, const void* input_b, void* output, const union xnn_f16_default_params params[restrict XNN_MIN_ELEMENTS(1)]) XNN_OOB_READS
void xnn_f16_vcmul_ukernel__neonfp16arith_u32( size_t batch, const void* input_a, const void* input_b, void* output, const union xnn_f16_default_params params[restrict XNN_MIN_ELEMENTS(1)]) XNN_OOB_READS
3、Call logic for failed functions: // Call optimized micro-kernel. vcmul(batch_size() * sizeof(uint16_t), a_data, b_data, y.data(), init_params != nullptr ? ¶ms : nullptr);
// Call optimized micro-kernel. vcmul(batch_size() * sizeof(float), a_data, b_data, y.data(), init_params != nullptr ? ¶ms : nullptr);
Now that I modify it to (void *) and run the test case normally, may it really be a bug ?
Hi, thanks for reporting this. Fix incoming
Thanks for catching the alignment issue
Was vst1_lane_u32((uint32_t*) or, vreinterpret_u32_f16(vaccr_lo), 0); or += 2;
9d740: 04 00 10 e3 tst r0, #4
9d744: 03 00 00 0a beq 0x9d758 <xnn_f16_vcmul_ukernel__neonfp16arith_u8+0xe4> @ imm = #12
9d748: 3d 38 c3 f4 vst1.32 {d19[0]}, [r3:32]!
9d74c: a3 34 f3 f2 vext.32 d19, d19, d19, #1
9d750: 3d 18 cc f4 vst1.32 {d17[0]}, [r12:32]!
9d754: a1 14 f1 f2 vext.32 d17, d17, d17, #1
9d758: 02 00 10 e3 tst r0, #2
9d75c: 10 80 bd 08 popeq {r4, pc}
Now vst1_lane_u32((void*) or, vreinterpret_u32_f16(vaccr_lo), 0); or += 2;
9d740: 04 00 10 e3 tst r0, #4
9d744: 03 00 00 0a beq 0x9d758 <xnn_f16_vcmul_ukernel__neonfp16arith_u8+0xe4> @ imm = #12
9d748: 0d 38 c3 f4 vst1.32 {d19[0]}, [r3]!
9d74c: a3 34 f3 f2 vext.32 d19, d19, d19, #1
9d750: 0d 18 cc f4 vst1.32 {d17[0]}, [r12]!
9d754: a1 14 f1 f2 vext.32 d17, d17, d17, #1
9d758: 02 00 10 e3 tst r0, #2
9d75c: 10 80 bd 08 popeq {r4, pc}
Thanks for the report. This was fixed back in March in https://github.com/google/XNNPACK/commit/e5c870c1309a7857ceb9e4f5a2be30087d485874