XNNPACK
XNNPACK copied to clipboard
High-efficiency floating-point neural network inference operators for mobile, server, and Web
### Goal Enable x32-packw to speed up dynamic fully connected layer for LLM model. ### Background GEMM u-kernel uses input and packed_weight(weight and bias) to calculate output value. Our GEMM...
Several tests fail when built for Hexagon and run under the simulator; the QuRT exit code (0x2001) indicates the failures are load from misaligned addresses: ``` test/qs8_dwconv_minmax_multipass_fp32_test test/qs8_qc8w_dwconv_minmax_multipass_fp32_test test/qu8_dwconv_minmax_multipass_fp32_test ```...
Don't use mmap/munmap/mprotect for XNN_PLATFORM_QURT: the functions aren't available to ordinary user code. Instead, just use qurt_alloc/qurt_free.
Some of our tests (eg. for testing code cache) use mmap()/munmap()/etc for memory management. On the simulator, these tests appear to run just fine, but on a Samsung S22, these...
When this test is built for Hexagon and executed on the simulator, a number of the test cases fail with wildly-incorrect output. (Interestingly, this is the *only* test under the...
Add WAsmSIMD rdsum accumulating microkernels
Add F16F32ACC NEONFP16ARITH rdsum accumulating microkernels
Add f32 Maxpool RVV implementation microkernels for LMUL 1 and 2, tests and config changes.
Add AVX512F rdsum accumulating microkernels