XNNPACK issues

Accumulating AVX rdsum microkernels

copybara-service[bot]

Add f32 Avgpool RVV implementation micro-kernels, tests and config changes.

2

KaustubhIMG

AMX QD8_F32_QC8W GEMM generate all tile sizes

AMX QD8_F32_QC8W GEMM generate all tile sizes - MR 1 to 16 - NR 16,32,64

copybara-service[bot]

cmake build failure with XNNPACK_BUILD_TESTS=ON and XNNPACK_LIBRARY_TYPE=shared

Building https://github.com/google/XNNPACK/commit/058ff10e0ba0a62d87fd39aa87418ce28b961755 with: - cmake 3.29.2 - gcc 13.2.1 - binutils 2.42 using: ``` $ CFLAGS='-fPIC' cmake -B build -S xnnpack -DXNNPACK_BUILD_TESTS=ON -DXNNPACK_LIBRARY_TYPE=shared $ cmake --build build .... /usr/bin/ld: [...

loqs

Add `WAsm SIMD` microkernel for `f32-rsqrt`.

copybara-service[bot]

When no weight cache is provided to XNNPack, create one to share packed weights between operations.

copybara-service[bot]

Fix Bazel build for bench subdir

1

Several targets were including a nonexistent file in `srcs`, meaning they would not build properly. The file they (apparently) want to include is already in the deps set, so simple...

steven-johnson

Exported helper functions for transposition normalization.

copybara-service[bot]

How can I parallelize the execution of this benchmark? (https://github.com/google/XNNPACK/blob/master/bench/spmm-benchmark.h)

2

AnonymousYWL

Enable AVX512 and AVX2 F32_RADDSTOREEXPMINUSMAX microkernels

Enable AVX512 and AVX2 F32_RADDSTOREEXPMINUSMAX microkernels - Fix AVX2/AVX512 batch size - measured in elements, not bytes

copybara-service[bot]

XNNPACK
XNNPACK copied to clipboard

Metadata

Accumulating AVX rdsum microkernels

Add f32 Avgpool RVV implementation micro-kernels, tests and config changes.

AMX QD8_F32_QC8W GEMM generate all tile sizes

cmake build failure with XNNPACK_BUILD_TESTS=ON and XNNPACK_LIBRARY_TYPE=shared

Add `WAsm SIMD` microkernel for `f32-rsqrt`.

When no weight cache is provided to XNNPack, create one to share packed weights between operations.

Fix Bazel build for bench subdir

Exported helper functions for transposition normalization.

How can I parallelize the execution of this benchmark? (https://github.com/google/XNNPACK/blob/master/bench/spmm-benchmark.h)

Enable AVX512 and AVX2 F32_RADDSTOREEXPMINUSMAX microkernels

← Metadata

Owner

Metadata

XNNPACK XNNPACK copied to clipboard

Metadata

← Metadata

Owner

Metadata

XNNPACK
XNNPACK copied to clipboard