NEGEMMLowpMatrixMultiplyCore: QASYMM8 src1 & QASYMM8_SIGNED src2 support
In accordance with documentation NEGEMMLowpMatrixMultiplyCore suports only limited combinations of QSYMM8 and QASYMM8_SIGNED precisions on inputs:
| src0 | src1 | src2 | dst |
|---|---|---|---|
| QASYMM8_SIGNED | QSYMM8 | S32 | QASYMM8_SIGNED |
| QASYMM8_SIGNED | QSYMM8 | S32 | S32 |
But we need to support QSYMM8 on src1 and QASYMM8_SIGNED on src2. Why this combinations is not supported? Can I use shift / zero-point in the second NEQuantizationLayer to resolve the issue?
Are you going to support QSYMM8 on src1 and QASYMM8_SIGNED on src2 in the future? Thanks!
[UPD] Note, please, I modfied examples to check QSYMM8 and QASYMM8_SIGNED on inputs support. You can easily explore source code here: https://github.com/eshoguli/ComputeLibrary/commit/28c57d4f8de6df37d8edd031362160d76fda079e. There are no any validation exceptions for QSYMM8 and QASYMM8_SIGNED inputs but output results are not correct.
Tensors logging for QSYMM8 and QASYMM8_SIGNED with incorrect results examples/neon_gemm_u8s8_s32.cpp:
./build/examples/neon_gemm_u8s8_s32
Usage: ./build/neon_gemm_qasymm8 M N K
Too few or no inputs provided. Using default M=4, N=4, K=4
q_src1 QASYMM8:
25 0 0 0
0 25 0 0
0 0 25 0
0 0 0 25
q_src2 QASYMM8_SIGNED:
0 2 -3 5
-7 9 -10 12
-14 15 -17 19
-20 22 -24 26
Lowp GEMM output S32:
0 50 6325 125
6225 225 6150 300
6050 375 5975 475
5900 550 5800 650
Tensors logging for QASYMM8_SIGNED and QASYMM8_SIGNED with correct results as reference examples/neon_gemm_s8s8_s32.cpp:
./build/examples/neon_gemm_s8s8_s32
Usage: ./build/neon_gemm_qasymm8 M N K
Too few or no inputs provided. Using default M=4, N=4, K=4
find_implementation: a64_hybrid_s8s32_dot_6x16
find_implementation: a64_hybrid_s8s32_dot_6x16
find_implementation: a64_hybrid_s8s32_dot_6x16
q_src1 QASYMM8_SIGNED:
25 0 0 0
0 25 0 0
0 0 25 0
0 0 0 25
q_src2 QASYMM8_SIGNED:
0 2 -3 5
-7 9 -10 12
-14 15 -17 19
-20 22 -24 26
Lowp GEMM output S32:
0 50 -75 125
-175 225 -250 300
-350 375 -425 475
-500 550 -600 650
Hi @eshoguli , By next Monday, 05 August, I will get a clear answer on when this new feature will be provided. Thanks
Tested on commit:
commit c5dd7753d0475ffec0f192f3181fe67a1d761680 (tag: v24.07, origin/main, origin/HEAD, main)
Author: Jenkins <[email protected]>
Date: Fri Jul 26 12:07:30 2024 +0000
Compute Library v24.07
How to easilly reproduce branch: es/aarch64/neon_gemm_u8i8_support/ example files:
- U8 + I8 = S32: neon_gemm_u8s8_s32_comparision.cpp
- U8 + I8 = F32: neon_gemm_u8s8_f32_comparision.cpp
build: scons arch=arm64-v8.2-a neon=1 opencl=0 openmp=0 cppthreads=1 os=macos data_layout_support=all build=native asserts=1 --jobs=8 --silent os=macos build=native fixed_format_kernels=True validation_tests=1 examples=1 debug=0
run: ./build/examples/neon_gemm_u8s8_s32_comparision
expected results: 120 for each result matrix item. Ectual value is 7560. Note, please, if we update signed value -2 to 2 here: https://github.com/eshoguli/ComputeLibrary/blob/es/aarch64/neon_gemm_u8i8_support/examples/neon_gemm_u8s8_f32_comparision.cpp#L174, then results will be OK.
Hi @eshoguli
The following patch adds mixed sign support in GEMM and has already been merged to main.
I made some changes to your test neon_gemm_u8s8_f32_comparision.cpp to also compute SGEMM and compare the output with GEMMLOWP. As you can see below the output is -12 in both cases.
root@hikey:~/tmp/user/github# LD_LIBRARY_PATH=./:$LD_LIBRARY_PATH ./neon_gemm_u8s8_f32_comparision 3 3 3
src1 F32 [6, 16, 1, 1]:
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
src2 F32 [16, 6, 1, 1]:
-2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2
-2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2
-2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2
-2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2
-2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2
-2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2
q_src1 QASYMM8_SIGNED [6, 16, 1, 1]:
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
q_src2 QASYMM8_SIGNED [16, 6, 1, 1]:
-4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4
-4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4
-4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4
-4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4
-4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4
-4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4
Lowp GEMM output F32 [16, 16, 1, 1]:
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
SGEMM F32 [16, 16, 1, 1]:
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
-12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12 -12
In your test I just added the following code at the end to print the output of sgemm
////
236 NEGEMM fgemm{};
237
238 Tensor dst;
239 dst.allocator()->init(TensorInfo(TensorShape(16, 16, 1, 1), 1, DataType::F32));
240 fgemm.configure(&src1, &src2, nullptr, &dst, 1, 0);
241 dst.allocator()->allocate();
242 fgemm.run();
243
244
245 // Print sgemm output
246 std::cout << "SGEMM " << dst.info() << ":" << std::endl;
247 dst.print(std::cout);
248
249
250
251
252
253 return 0;
254 }
Validated: case with QASYMM8 + QASYMM8_SIGNED inputs and F32 output is supported https://review.mlplatform.org/ml/ComputeLibrary, thanks! Note, please, the fix has not yet been applied in https://github.com/ARM-software/ComputeLibrary