neural-compressor Extend SmoothQuant support (exclude nodes, fuse into layernorm)

Extend SmoothQuant support (exclude nodes, fuse into layernorm)

Open fxmarty opened this issue 1 year ago • 8 comments

Type of Change

As per title.

Description

This PR includes two new features for SmoothQuant (that I was too lazy to split into two PRs):

Add the fusion of MatMul -> Add -> MatMul into MatMul -> Add (that is typically the case for layernorm) as https://github.com/mit-han-lab/smoothquant/blob/78badc0d975506de9fe44b2fe79d9a35d0fd4914/smoothquant/smooth.py#L46
Add a parameter nodes_to_exclude following ORT quantizer & Optimum quantizer fashion, to allow to exclude some nodes from being smoothed. This is following https://github.com/mit-han-lab/smoothquant/issues/15#issuecomment-1353390283 (out_proj & fc2 should not be smoothed out to reproduce the paper results).

How has this PR been tested?

Locally - I did not test that output from the fusion match but I just reused the code from the mul method. Let me know if I should add tests.

Oct 27 '23 13:10 fxmarty

@chensuyue @mengniwang95 @PenghuiCheng @xin3he happy to get a review on this one!

Oct 27 '23 13:10 fxmarty

There are some issues detected by CI, could you have a look?
https://dev.azure.com/lpot-inc/neural-compressor/_build/results?buildId=21391&view=logs&j=e5896b99-a49d-517b-218b-3b918f0c116d&t=b18c099a-26ee-5571-9980-67a803d9b7da&l=20474
https://dev.azure.com/lpot-inc/neural-compressor/_build/results?buildId=21391&view=logs&j=c6aa4c58-99e4-54e9-e3eb-cd322b75c938&t=dd753db2-b82a-57eb-728f-6b88742237f1&l=6311

Oct 31 '23 05:10 chensuyue

Thank you @chensuyue, will have a look!

Oct 31 '23 08:10 fxmarty

Further, could you add a ut to make sure it can work correctly?

Nov 07 '23 04:11 mengniwang95

@fxmarty would you like to follow this PR?

Nov 16 '23 06:11 chensuyue

We will have code freeze on 11/22, if this PR could be merged before the date, it can be packaged into v2.4 release.

Nov 16 '23 06:11 chensuyue

@fxmarty will you fix the PR?

Nov 21 '23 14:11 chensuyue

@chensuyue Sorry I did not get time to fix it, I won't be able before the release unfortunately.

Nov 21 '23 14:11 fxmarty

Close the PR first due to pending for a long time, feel free to reopen when you have time to handle the issue.

May 21 '24 06:05 chensuyue

neural-compressor neural-compressor copied to clipboard

Extend SmoothQuant support (exclude nodes, fuse into layernorm)

Type of Change

Description

How has this PR been tested?

neural-compressor
neural-compressor copied to clipboard