XNNPACK icon indicating copy to clipboard operation
XNNPACK copied to clipboard

High-efficiency floating-point neural network inference operators for mobile, server, and Web

Results 342 XNNPACK issues
Sort by recently updated
recently updated
newest added

Replaced gavgpool and gsumpool with static_reduce.

Simplify the `xnn_pack_f{16,32}_gemm_g{io,oi}_w` functions to use `memcpy` and `memset` where appropriate (`kr=1` and `sr=1`). This significantly speeds up the packing of non-static right-hand operands to the `f32` and `f16` `FullyConnected`...

Add 5x8c8 wasmdot kernels, as it performs better than the default configuration. Especially it will get better speedup after AVX-256 revectorization (see [chromium bug](https://issues.chromium.org/issues/42202660)) at runtime. --------------------------------------------------------------------------------------------------- |   |   --...

Add qs8 c4 wasmsdot templates which can perform better than the default configuration. The new template helps generate more effecient AVX-256 revectorized code with very few inserts at run time....