XNNPACK
XNNPACK copied to clipboard
High-efficiency floating-point neural network inference operators for mobile, server, and Web
…Dot Product This PR is related to issue https://github.com/google/XNNPACK/issues/6454. This change adds qs8_qc8w gemm/igemm microkernels for Wasm relaxed simd dot product on signed and unsigned bytes. The new microkernels can...
Remove unnecesary checks for neon for div
To avoid the CFI invalid cast failures observed in http://b/349625080
The goal of this issue is to monitor development progress for this rather large feature with multiple contributors involved. Additionally, it serves as a vehicle to make open questions, and...
before: fp_acc = 1/16 * (vksum * 16 + float(int_acc * 16) * scale) after: fp_acc = vksum + float(int_acc * 16) * scale / 16
This PR is for the FP16 AVX512SKX Ukernels. Based on the QC4 Kernels, FP16 was only written with MRx8c8 Tile sizes, and did not include the prefetch kernels. For parity...
This PR adds blockwise 4-bit GEMM microkernels targeting x86 AVX512 instruction family, This only includes the fp32 and prefetch avxskx ukernels. Tests and Benchmarks were run on Icelake Xeon Processor....
This PR updates test generation for blockwise (qb4w) kernels in preparation for ISA-specific kernels with kr > 2. Blockwise kernels currently enforce several constraints: 1) Kc is divisible by block...