XNNPACK
XNNPACK copied to clipboard
High-efficiency floating-point neural network inference operators for mobile, server, and Web
F16C f16-f32acc rdsum microkernels
Enable -mavx512fp16 needed for avx512fp16 microkernels
Add F16F32ACC AVX512SKX rdsum accumulating microkernels
Add partial support for building/testing/benchmarking XNNPACK on Hexagon. Additional work would need to be done to get this fully working in the Bazel build (notably, connecting to a Qualcomm SDK)...
Roll back of #6365 (Enable F16-RMINMAX and F16-RMAX microkernels using AVX512 FP16 arithmetics), breaks some internal tests.
Test `packing-test --gtest_filter="PACK_QD8_F32_QB4W_GEMM_GOI_W.*"`
Add AVX512FP16 vbinary microkernels - use fp16 native arithmetics for avx512 - vop and vopc templates - add, sub, mul, div, max, min, sqrdiff
Add f16_vsqrdiffc_test and f16_vsqrdiff_test build targets Fixes #6395
F32 and F16 are missing vbinary benchmarks There are vunary benchmarks which are generated from tests And a few have 8 bit tests qs8-vadd.cc:#include qs8-vaddc.cc:#include qs8-vmul.cc:#include qs8-vmulc.cc:#include qu8-vadd.cc:#include qu8-vaddc.cc:#include qu8-vmul.cc:#include...
:f16_vsqrdiff_test :f16_vsqrdiffc_test But they do exit for f32