XNNPACK
XNNPACK copied to clipboard
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Reshape global sum pooling output
Refactor `xnn_tensor_get_size` into a helper function `xnn_datatype_get_size_bits`. Also makes things a bit more generic (if we add 2 bit datatypes, it should just work, assuming they align the same way...
Try fix OS bazel
Add a script to automate the `vunary` benchmarks generation from the microkernels' test specification. This effectively adds benchmarks for `f32-vabs`, `f32-vneg`, and `f32-vsqr`.
QS8/QD8 GEMM/IGEMM on AVX2 use 2x8c8 instead of 3x8c8 - 3x8 spills registers and is slower than 2x8
Add iterative and non-iterative `vrsqrt` microkernels for SSE.
Add an XNNPACK delegate for the `Rsqrt` node in TFLite.
Add QC8 VNNI GEMM microkernels
Shape inference for Add