XNNPACK issues

Reshape global sum pooling output

Refactor `xnn_tensor_get_size` into a helper function `xnn_datatype_get_size_bits`. Also makes things a bit more generic (if we add 2 bit datatypes, it should just work, assuming they align the same way that 4 bit types do).

Refactor `xnn_tensor_get_size` into a helper function `xnn_datatype_get_size_bits`. Also makes things a bit more generic (if we add 2 bit datatypes, it should just work, assuming they align the same way...

copybara-service[bot]

Try fix OS bazel

copybara-service[bot]

Add a script to automate the `vunary` benchmarks generation from the microkernels' test specification.

Add a script to automate the `vunary` benchmarks generation from the microkernels' test specification. This effectively adds benchmarks for `f32-vabs`, `f32-vneg`, and `f32-vsqr`.

copybara-service[bot]

QS8/QD8 GEMM/IGEMM on AVX2 use 2x8c8 instead of 3x8c8

QS8/QD8 GEMM/IGEMM on AVX2 use 2x8c8 instead of 3x8c8 - 3x8 spills registers and is slower than 2x8

copybara-service[bot]

Add qs8/qu8 vadd/vaddc RVV microkernel implementations

5

ManojIMG

Shape inference for Add

copybara-service[bot]

XNNPACK
XNNPACK copied to clipboard

Metadata

Reshape global sum pooling output

Refactor `xnn_tensor_get_size` into a helper function `xnn_datatype_get_size_bits`. Also makes things a bit more generic (if we add 2 bit datatypes, it should just work, assuming they align the same way that 4 bit types do).

Try fix OS bazel

Add a script to automate the `vunary` benchmarks generation from the microkernels' test specification.

QS8/QD8 GEMM/IGEMM on AVX2 use 2x8c8 instead of 3x8c8

Add iterative and non-iterative `vrsqrt` microkernels for SSE.

Add an XNNPACK delegate for the `Rsqrt` node in TFLite.

Add QC8 VNNI GEMM microkernels

Add qs8/qu8 vadd/vaddc RVV microkernel implementations

Shape inference for Add

← Metadata

Owner

Metadata

XNNPACK XNNPACK copied to clipboard

Metadata

← Metadata

Owner

Metadata

XNNPACK
XNNPACK copied to clipboard