XNNPACK
XNNPACK copied to clipboard
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Fix crash of TensorFlow Lite in function xnn_create_convert_nc_qs8 due to dereferencing null pointer in Android Emulator on armeabi-v7a.
This pull request adds blockwise 4-bit (qb4w) GEMM microkernels targeting ARM Neon via the MLAL instruction family. Note: This PR includes one commit from https://github.com/google/XNNPACK/pull/6557 (Test generation update for qb4w)....
This pull requests adds blockwise 4-bit (qb4w) GEMM microkernels targetinsg x86 SSE2 and SSE4.1 Instruction Family. Note: This PR includes one commit from https://github.com/google/XNNPACK/pull/6557 (Test generation update for qb4w). I'm...
This PR aims to enable RVV GEMM/IGEMM/X32-PACKW in GEMM config. It leads to enabling RVV implementation in operator API.
This pull request adds blockwise 4-bit (qb4w) GEMM microkernels targeting x86 AVX instruction family. Note: Since AVX1 Ukernels share the same meta kernels as SSE2/4.1 kernels, this PR sits ontop...
This pull request adds blockwise 4-bit (qb4w) GEMM microkernels targeting x86 via the AVX2 instruction family. Note: This PR includes one commit from https://github.com/google/XNNPACK/pull/6557 (Test generation update for qb4w). I'm...
XNNPACK by default uses 5x16 fp32-gemm kernel for `x86_fma3`, but we found that 4x16s4 kernel shows better performance on `meteor lake` CPU (`Intel(R) Core(TM) Ultra 7 155H`) | benchmark |...
It seems part of the code haven't been compiled. Any idea on how to fix it? Thanks in advance! ``` FAILED: subgraph-size-test.exe C:\windows\system32\cmd.exe /C "cd . && C:\Programs\Python\Python311-arm64\Lib\site-packages\cmake\data\bin\cmake.exe -E vs_link_exe...
Added standalone rsum HVX