XNNPACK
XNNPACK copied to clipboard
High-efficiency floating-point neural network inference operators for mobile, server, and Web
We needed to adjust XNN_ALLOCATION_ALIGNMENT to 128 Byte for HVX and use predicate store for the tail part. Ctest passed but performance is not good yet. The next step naturally...
while building mediapipe with xnnpack, error occurs. (base) sstc@sstc-B450MH:~/0506/mediapipe$ bazel build -c opt --define MEDIAPIPE_DISABLE_GPU=1 mediapipe/examples/desktop/pose_tracking:pose_tracking_cpu --verbose_failures WARNING: /home/sstc/0506/mediapipe/mediapipe/framework/BUILD:69:24: in cc_library rule //mediapipe/framework:calculator_cc_proto: target '//mediapipe/framework:calculator_cc_proto' depends on deprecated target '@com_google_protobuf//:cc_wkt_protos':...
Fix caching of weights in `create_gemm_or_igemm` in `convolution-nhwc.cc`. The memory was unconditionally reserved before that.
I had two new errors, so I updated the build recipe for Hexagon. 1. I found XNNPACK_ENABLE_RISCV_VECTOR=ON caused compilation error, so disabled this cmake variable. If you find any issue...
AVX512FP16 - add compiler flag guard around fp16 code
Enable AVX512FP16 vmul vbinary microkernels
neon mlal qs8 rsum accumulating microkernels
scalar qs8 rsum accumulating microkernels
- Initial implementation and test added. - xnnpack/intrinsics-polyfill.h has the horizontal sum code (Q6_f32_vrsum_Vsf) using vshuff and vadd.
Fix `generate-enum.py` to use `#include "..."` instead of `#include `.