XNNPACK issues

Replaced gavgpool and gsumpool with static_reduce.

1

Replaced gavgpool and gsumpool with static_reduce.

copybara-service[bot]

Simplify the `xnn_pack_f{16,32}_gemm_g{io,oi}_w` functions to use `memcpy` and `memset` where appropriate (`kr=1` and `sr=1`).

Simplify the `xnn_pack_f{16,32}_gemm_g{io,oi}_w` functions to use `memcpy` and `memset` where appropriate (`kr=1` and `sr=1`). This significantly speeds up the packing of non-static right-hand operands to the `f32` and `f16` `FullyConnected`...

copybara-service[bot]

Support qs8 5x8c8 wasmdot microkernels

1

Add 5x8c8 wasmdot kernels, as it performs better than the default configuration. Especially it will get better speedup after AVX-256 revectorization (see [chromium bug](https://issues.chromium.org/issues/42202660)) at runtime. --------------------------------------------------------------------------------------------------- | | --...

yolanda15

Add qs8 c4 microkernels for wasmsdot

1

Add qs8 c4 wasmsdot templates which can perform better than the default configuration. The new template helps generate more effecient AVX-256 revectorized code with very few inserts at run time....

yolanda15

Add initial bits for RNDNU16 requantization.

copybara-service[bot]

Put back missing packing optimization.

copybara-service[bot]

XNNPACK
XNNPACK copied to clipboard

Metadata

Replaced gavgpool and gsumpool with static_reduce.

Integration of new kleidi

Simplify kc remainder processing for qs8-x8c8 avxvnni

Xnn f32 reduce window

Set requantization scale upper bound to 1.0

Simplify the `xnn_pack_f{16,32}_gemm_g{io,oi}_w` functions to use `memcpy` and `memset` where appropriate (`kr=1` and `sr=1`).

Support qs8 5x8c8 wasmdot microkernels

Add qs8 c4 microkernels for wasmsdot

Add initial bits for RNDNU16 requantization.

Put back missing packing optimization.

← Metadata

Owner

Metadata

XNNPACK XNNPACK copied to clipboard

Metadata

← Metadata

Owner

Metadata

XNNPACK
XNNPACK copied to clipboard