neural-compressor
neural-compressor copied to clipboard
Extend SmoothQuant support (exclude nodes, fuse into layernorm)
Type of Change
As per title.
Description
This PR includes two new features for SmoothQuant (that I was too lazy to split into two PRs):
- Add the fusion of
MatMul -> Add -> MatMul
intoMatMul -> Add
(that is typically the case for layernorm) as https://github.com/mit-han-lab/smoothquant/blob/78badc0d975506de9fe44b2fe79d9a35d0fd4914/smoothquant/smooth.py#L46 - Add a parameter
nodes_to_exclude
following ORT quantizer & Optimum quantizer fashion, to allow to exclude some nodes from being smoothed. This is following https://github.com/mit-han-lab/smoothquant/issues/15#issuecomment-1353390283 (out_proj & fc2 should not be smoothed out to reproduce the paper results).
How has this PR been tested?
Locally - I did not test that output from the fusion match but I just reused the code from the mul
method. Let me know if I should add tests.
@chensuyue @mengniwang95 @PenghuiCheng @xin3he happy to get a review on this one!
There are some issues detected by CI, could you have a look?
https://dev.azure.com/lpot-inc/neural-compressor/_build/results?buildId=21391&view=logs&j=e5896b99-a49d-517b-218b-3b918f0c116d&t=b18c099a-26ee-5571-9980-67a803d9b7da&l=20474
https://dev.azure.com/lpot-inc/neural-compressor/_build/results?buildId=21391&view=logs&j=c6aa4c58-99e4-54e9-e3eb-cd322b75c938&t=dd753db2-b82a-57eb-728f-6b88742237f1&l=6311
Thank you @chensuyue, will have a look!
Further, could you add a ut to make sure it can work correctly?
@fxmarty would you like to follow this PR?
We will have code freeze on 11/22, if this PR could be merged before the date, it can be packaged into v2.4 release.
@fxmarty will you fix the PR?
@chensuyue Sorry I did not get time to fix it, I won't be able before the release unfortunately.
Close the PR first due to pending for a long time, feel free to reopen when you have time to handle the issue.