XNNPACK issues

Add QS8_QC8W GEMM/IGEMM microkernels for Wasm Relaxed Unsigned and Signed …

2

…Dot Product This PR is related to issue https://github.com/google/XNNPACK/issues/6454. This change adds qs8_qc8w gemm/igemm microkernels for Wasm relaxed simd dot product on signed and unsigned bytes. The new microkernels can...

fanchenkong1

Remove unnecesary checks for neon for div

copybara-service[bot]

Fix pointer casting to avoid CFI issues

To avoid the CFI invalid cast failures observed in http://b/349625080

rajendrant

Add f32 prelu RVV microkernels, tests and config.

8

KaustubhIMG

Add f32 vrelu RVV implementation microkernels, tests and config changes

KaustubhIMG

QB4W Development

The goal of this issue is to monitor development progress for this rather large feature with multiple contributors involved. Additionally, it serves as a vehicle to make open questions, and...

digantdesai

Move 1/16.0f to packing routine

before: fp_acc = 1/16 * (vksum * 16 + float(int_acc * 16) * scale) after: fp_acc = vksum + float(int_acc * 16) * scale / 16

digantdesai

AVX512SKX QB4 Kernels [F16]

This PR is for the FP16 AVX512SKX Ukernels. Based on the QC4 Kernels, FP16 was only written with MRx8c8 Tile sizes, and did not include the prefetch kernels. For parity...

mcr229

QB4W AVXSKX GEMM

This PR adds blockwise 4-bit GEMM microkernels targeting x86 AVX512 instruction family, This only includes the fp32 and prefetch avxskx ukernels. Tests and Benchmarks were run on Icelake Xeon Processor....

mcr229

Update test and benchmark generation for blockwise kr>2

2

This PR updates test generation for blockwise (qb4w) kernels in preparation for ISA-specific kernels with kr > 2. Blockwise kernels currently enforce several constraints: 1) Kc is divisible by block...

GregoryComer

XNNPACK
XNNPACK copied to clipboard

Metadata

Add QS8_QC8W GEMM/IGEMM microkernels for Wasm Relaxed Unsigned and Signed …

Remove unnecesary checks for neon for div

Fix pointer casting to avoid CFI issues

Add f32 prelu RVV microkernels, tests and config.

Add f32 vrelu RVV implementation microkernels, tests and config changes

QB4W Development

Move 1/16.0f to packing routine

AVX512SKX QB4 Kernels [F16]

QB4W AVXSKX GEMM

Update test and benchmark generation for blockwise kr>2

← Metadata

Owner

Metadata

XNNPACK XNNPACK copied to clipboard

Metadata

← Metadata

Owner

Metadata

XNNPACK
XNNPACK copied to clipboard