[CPU] Enable DA8W4 on CPU
Summary This PR enables DA8W4 on CPU.
- It adds a new layout
Int8DynamicActInt4WeightCPULayoutand its implementation - It adds two custom ops:
da8w4_linear_prepack_cpufor weight packing andda8w4_linear_cpufor DA8W4 GEMM. - It adds C++ kernels for the two new custom ops
The ops and kernels won't be available unless torchao is built from source with USE_CPP_KERNELS=1 on Linux only.
Test plan
pytest test/quantization/test_quant_api.py -k test_8da4w_cpu
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2128
- :page_facing_up: Preview Python docs built from this PR
Note: Links to docs will display an error until the docs builds have been completed.
:white_check_mark: No Failures
As of commit e3731f720f2dd7da50f6ef37bbdbb53895fa5b6b with merge base 4ebc9c042565e16af249b7cec8ebb2dc9fa0274f ():
:green_heart: Looks good so far! There are no failures yet. :green_heart:
This comment was automatically generated by Dr. CI and updates every 15 minutes.
@leslie-fang-intel This PR is updated to use a new layout. Please review again. Thanks.
Hi @jerryzh168 Could you please review this PR? Thanks.
Hi @jerryzh168 Could you please review this PR? Thanks.
Hi @jerryzh168 Could you please review this PR? Thanks.
Hi @leslie-fang-intel Please review this PR again. I have also added the kernel code in this PR. It showed reasonable performance in internal benchmarks. Thanks.
Please also describe how we choose different implementations based on the CPU Info.
I have added more details in the description. Thanks.
Hi @jerryzh168 Could you please review this PR? Thanks. It's changed a lot since your last review.
Hi @jerryzh168 Could you please review this PR? Thanks.