Gregory Comer
Gregory Comer
As per previous design discussions, update subgraph node definition methods to allow for FP16 inputs. Add test coverage for FP16 support using subgraph API. Tested building with cmake and running...
Always re-generate sources (via extract_sources) when running cmake configure. This is done to resolve [T185463079](https://www.internalfb.com/intern/tasks/?t=185463079). Note that this does not impact iterative builds, which still work fine. This change prevents...
Support for x86 AMX was added in Clang 11 and GCC 11. There is existing logic in the CMake build to conditionally compile in the AMX kernels when the GCC...
Summary: Add qb4w scalar 1x2, 1x4, 1x8, 2x2, 2x4, 2x8, and 4x4 kernels to XNNPACK. Add test coverage in ExecuTorch op-level linear test coverage for 4-bit blockwise weights / fp16....
When building the llama_main target on MacOS, build fails with the following error: ``` [100%] Linking CXX executable llama_main ld: warning: -s is obsolete ld: unknown options: --gc-sections clang: error:...
This PR updates test generation for blockwise (qb4w) kernels in preparation for ISA-specific kernels with kr > 2. Blockwise kernels currently enforce several constraints: 1) Kc is divisible by block...
This pull request adds blockwise 4-bit (qb4w) GEMM microkernels targeting ARM Neon via the MLAL instruction family. Note: This PR includes one commit from https://github.com/google/XNNPACK/pull/6557 (Test generation update for qb4w)....
This pull request adds blockwise 4-bit (qb4w) GEMM microkernels targeting x86 via the AVX2 instruction family. Note: This PR includes one commit from https://github.com/google/XNNPACK/pull/6557 (Test generation update for qb4w). I'm...
Testing CI on qb4w-subgraph branch of the XNNPACK library in preparation for XNNPACK upgrade.
Testing baseline full CI run for XNNPACK upgrade.