nncf icon indicating copy to clipboard operation
nncf copied to clipboard

Scale estimation/rectification for int4 compression

Open andreyanufr opened this issue 1 year ago • 2 comments

Changes

Added scale estimation for compression which minimizes L2 error between original MatMul and compressed one.

Reason for changes

Increases accuracy for compressed to 4 bit models.

Related tickets

CVS-129177

Tests

In process

andreyanufr avatar Mar 06 '24 08:03 andreyanufr

Codecov Report

Attention: Patch coverage is 8.36653% with 230 lines in your changes are missing coverage. Please review.

Project coverage is 29.95%. Comparing base (17a5b65) to head (f06095e). Report is 5 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##           develop    #2549       +/-   ##
============================================
- Coverage    91.19%   29.95%   -61.24%     
============================================
  Files          493      494        +1     
  Lines        45468    45775      +307     
============================================
- Hits         41464    13713    -27751     
- Misses        4004    32062    +28058     
Files Coverage Δ
nncf/quantization/advanced_parameters.py 84.06% <100.00%> (-7.91%) :arrow_down:
...ntization/algorithms/weight_compression/backend.py 0.00% <ø> (-100.00%) :arrow_down:
nncf/openvino/quantization/quantize_model.py 0.00% <0.00%> (-61.30%) :arrow_down:
...ion/algorithms/weight_compression/torch_backend.py 0.00% <0.00%> (-84.11%) :arrow_down:
nncf/torch/quantization/quantize_model.py 0.00% <0.00%> (-92.50%) :arrow_down:
nncf/quantization/quantize_model.py 34.78% <12.50%> (-42.67%) :arrow_down:
...ization/algorithms/weight_compression/algorithm.py 0.00% <0.00%> (-96.49%) :arrow_down:
.../quantization/algorithms/weight_compression/awq.py 0.00% <0.00%> (-93.34%) :arrow_down:
...n/algorithms/weight_compression/weight_lowering.py 0.00% <0.00%> (-97.71%) :arrow_down:
.../algorithms/weight_compression/openvino_backend.py 0.00% <0.00%> (-98.34%) :arrow_down:
... and 1 more

... and 319 files with indirect coverage changes

Flag Coverage Δ
COMMON ?
ONNX ?
OPENVINO ?
TENSORFLOW 29.95% <8.36%> (-0.16%) :arrow_down:
TORCH ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
common 76.35% <ø> (-17.42%) :arrow_down:
torch 0.01% <0.00%> (-93.59%) :arrow_down:
tensorflow 93.74% <ø> (ø)
onnx 0.00% <ø> (-93.07%) :arrow_down:
openvino 0.00% <0.00%> (-94.19%) :arrow_down:
ptq 15.26% <8.43%> (-74.80%) :arrow_down:

codecov[bot] avatar Mar 06 '24 08:03 codecov[bot]

lambada-openai      
model precision acc ppl
       
stabilityai_stablelm-2-zephyr-1_6b fp32 0.5925 6.3024
stabilityai_stablelm-2-zephyr-1_6b CompressWeightsModeINT4_SYM_r10_gs64_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_ffn_scale 0.5696 7.4355
stabilityai_stablelm-2-zephyr-1_6b CompressWeightsModeINT4_SYM_r10_gs64_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_fnn 0.5467 7.9706
stabilityai_stablelm-2-zephyr-1_6b int4_sym_r10_gs64_max_activation_variance 0.5428 8.5844
       
stabilityai_stablelm-3b-4e1t fp16 0.7132 3.8192
stabilityai_stablelm-3b-4e1t CompressWeightsModeINT4_SYM_r10_gs64_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_ffn_scale 0.6936 4.0961
stabilityai_stablelm-3b-4e1t int4_sym_r10_gs64_max_activation_variance 0.685 4.324
stabilityai_stablelm-3b-4e1t CompressWeightsModeINT4_SYM_r10_gs64_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_fnn 0.6798 4.4316
       
stable-zephyr-3b-dpo fp16 0.6099 6.7151
stable-zephyr-3b-dpo CompressWeightsModeINT4_SYM_r10_gs64_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_ffn_scale 0.5921 7.0513
stable-zephyr-3b-dpo CompressWeightsModeINT4_SYM_r10_gs64_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_fnn 0.5736 8.3502
stable-zephyr-3b-dpo int4_sym_r10_gs64_max_activation_variance 0.5618 9.3011
       
llama-2-7b-chat fp16 0.7108 3.262
llama-2-7b-chat CompressWeightsModeINT4_SYM_r10_gs128_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_ffn_scale 0.6911 3.5074
llama-2-7b-chat int4_sym_r10_gs128_max_activation_variance 0.6885 3.5719
llama-2-7b-chat CompressWeightsModeINT4_SYM_r10_gs128_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_fnn 0.6798 3.6947
       
zephyr-7b-beta fp16 0.7345 3.1783
zephyr-7b-beta CompressWeightsModeINT4_SYM_r10_gs128_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_ffn_scale 0.7297 3.2551
zephyr-7b-beta CompressWeightsModeINT4_SYM_r10_gs128_SensitivityMetricMAX_ACTIVATION_VARIANCE_awq_fnn 0.7074 3.4549
zephyr-7b-beta int4_sym_r10_gs128_max_activation_variance 0.707 3.5021

andreyanufr avatar Mar 15 '24 09:03 andreyanufr

Scale estimation algorithm doesn't work for group_size=-1 and fails with no clear message: image in the short term, error about not supported parameter for scale estimation can be enough. BTW, AWQ works fine with group_size=-1

ljaljushkin avatar Apr 19 '24 19:04 ljaljushkin