XNNPACK
XNNPACK copied to clipboard
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Accumulating AVX rdsum microkernels
AMX QD8_F32_QC8W GEMM generate all tile sizes - MR 1 to 16 - NR 16,32,64
Building https://github.com/google/XNNPACK/commit/058ff10e0ba0a62d87fd39aa87418ce28b961755 with: - cmake 3.29.2 - gcc 13.2.1 - binutils 2.42 using: ``` $ CFLAGS='-fPIC' cmake -B build -S xnnpack -DXNNPACK_BUILD_TESTS=ON -DXNNPACK_LIBRARY_TYPE=shared $ cmake --build build .... /usr/bin/ld: [...
Add `WAsm SIMD` microkernel for `f32-rsqrt`.
When no weight cache is provided to XNNPack, create one to share packed weights between operations.
Several targets were including a nonexistent file in `srcs`, meaning they would not build properly. The file they (apparently) want to include is already in the deps set, so simple...
Exported helper functions for transposition normalization.
Enable AVX512 and AVX2 F32_RADDSTOREEXPMINUSMAX microkernels - Fix AVX2/AVX512 batch size - measured in elements, not bytes