Max Ren

Results 48 issues of Max Ren

Summary: Replace all torch.ones with instances of torch.randn Differential Revision: D56907873

CLA Signed
fb-exported

Summary: Right now in XNNPACK we quantize bias to Int32 with scale being `act_scale * kernel_scale` and zp being `0`. We have been simulating this quantization in our delegation logic,...

fb-exported
release notes: quantization
release notes: AO frontend

Summary: In the below we added quantizer annotation to the bias of GEMM operations. We update some of our delegate lowering logic to allow for this change Reviewed By: digantdesai...

CLA Signed
fb-exported

Summary: Adding a test for qc8 linear Reviewed By: digantdesai Differential Revision: D55941565

CLA Signed
fb-exported

Summary: These tests are all being skipped, and have been marked with the associated tasks. However, on trunk it clutters the dashboard because it displays all of these as brokens...

CLA Signed
fb-exported

This PR is for the FP16 AVX512SKX Ukernels. Based on the QC4 Kernels, FP16 was only written with MRx8c8 Tile sizes, and did not include the prefetch kernels. For parity...

This PR adds blockwise 4-bit GEMM microkernels targeting x86 AVX512 instruction family, This only includes the fp32 and prefetch avxskx ukernels. Tests and Benchmarks were run on Icelake Xeon Processor....

This pull requests adds blockwise 4-bit (qb4w) GEMM microkernels targetinsg x86 SSE2 and SSE4.1 Instruction Family. Note: This PR includes one commit from https://github.com/google/XNNPACK/pull/6557 (Test generation update for qb4w). I'm...

This pull request adds blockwise 4-bit (qb4w) GEMM microkernels targeting x86 AVX instruction family. Note: Since AVX1 Ukernels share the same meta kernels as SSE2/4.1 kernels, this PR sits ontop...

Summary: In decomposition of Hardsigmoid, we see a divide by 6. When we lift this scalar to a tensor, we then see that the divisor is int64, and the dividend...

CLA Signed
fb-exported