XNNPACK
XNNPACK copied to clipboard
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Fix a crash on internal benchmark with relaxedsimd
QS8 AVX2 broadcast reorder input and weight loads before conversions
QS8 AVX2 broadcast unroll loads before doing cvt
Call xnnpack transpose from TfLite transpose and remove old optimized implementation
Average pool subgraph supports QU8
Delegate QU8 average pooling to XNNPACK
Generate neondot qc4w benchmarks. kr is in bytes.
Rename xnn_qd8_f32_qc4w_gemm_minmax_ukernel_fn and xnn_qd8_f32_qc8w_gemm_minmax_ukernel_fn
Rename X8 scalar gemm microkernels with u1 suffix
QS8 scalar GEMM template support unrolled microkernels - Unroll WASM by 4