nncf icon indicating copy to clipboard operation
nncf copied to clipboard

Represent symmetrically quantized weights in signed data type

Open l-bat opened this issue 1 year ago • 5 comments

Changes

Represent symmetrically quantized weights in signed data type with no zero point

Reason for changes

  • To detect the quantization type without analyzing zero-point values
  • Signed data type for symmetrically quantized weights will lead to a smaller footprint, especially in case of grouped quantization.

Related tickets

130625

Tests

Updated: tests/torch/ptq/test_weights_compression.py and tests/openvino/native/quantization/test_weights_compression.py

Merge after: https://github.com/openvinotoolkit/openvino/pull/24457

l-bat avatar Jan 29 '24 12:01 l-bat

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 91.18%. Comparing base (d06b174) to head (3e6c649). Report is 2 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##           develop    #2434       +/-   ##
============================================
+ Coverage    47.70%   91.18%   +43.47%     
============================================
  Files          483      483               
  Lines        46305    46363       +58     
============================================
+ Hits         22090    42274    +20184     
+ Misses       24215     4089    -20126     
Files Coverage Δ
nncf/parameters.py 100.00% <ø> (ø)
...ization/algorithms/weight_compression/algorithm.py 97.68% <ø> (+1.38%) :arrow_up:
...quantization/algorithms/weight_compression/gptq.py 94.87% <100.00%> (-0.07%) :arrow_down:
...n/algorithms/weight_compression/mixed_precision.py 98.11% <100.00%> (+0.01%) :arrow_up:
.../algorithms/weight_compression/openvino_backend.py 98.80% <100.00%> (+0.10%) :arrow_up:
.../algorithms/weight_compression/scale_estimation.py 92.52% <100.00%> (+0.75%) :arrow_up:
...ion/algorithms/weight_compression/torch_backend.py 84.71% <100.00%> (+84.71%) :arrow_up:
...n/algorithms/weight_compression/weight_lowering.py 95.13% <100.00%> (+0.21%) :arrow_up:
nncf/quantization/quantize_model.py 80.55% <ø> (+11.80%) :arrow_up:
nncf/torch/quantization/layers.py 95.97% <100.00%> (+57.64%) :arrow_up:
... and 1 more

... and 296 files with indirect coverage changes

Flag Coverage Δ
COMMON 42.02% <0.00%> (-0.13%) :arrow_down:
ONNX 34.19% <7.20%> (-0.03%) :arrow_down:
OPENVINO 40.85% <84.00%> (+0.03%) :arrow_up:
TENSORFLOW 29.42% <0.00%> (?)
TORCH 65.42% <41.60%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
common 93.55% <ø> (+24.22%) :arrow_up:
torch 93.65% <100.00%> (+60.59%) :arrow_up:
tensorflow 93.26% <ø> (+93.26%) :arrow_up:
onnx 93.06% <ø> (ø)
openvino 94.47% <100.00%> (+0.02%) :arrow_up:
ptq 90.44% <100.00%> (+9.47%) :arrow_up:

codecov[bot] avatar Jan 29 '24 12:01 codecov[bot]

Thank you for this feature, just wondering is GPTQ model going to be automatically saved as i4 as well? for example: "TheBloke/Llama-2-7b-Chat-GPTQ" which it is symmetric quantized

xiao1228 avatar Jan 30 '24 14:01 xiao1228

Thank you for this feature, just wondering is GPTQ model going to be automatically saved as i4 as well? for example: "TheBloke/Llama-2-7b-Chat-GPTQ" which it is symmetric quantized

No, this feature will only be enabled for weight compression via NNCF.

l-bat avatar Jan 31 '24 10:01 l-bat

Thank you for this feature, just wondering is GPTQ model going to be automatically saved as i4 as well? for example: "TheBloke/Llama-2-7b-Chat-GPTQ" which it is symmetric quantized

No, this feature will only be enabled for weight compression via NNCF.

are you planning to extend it and enable this for the GPTQ model? as it will be very helpful. Current GPTQ model has a per-tensor zero point and u4 weights, which will make sense to save as i4 as symmetric.

xiao1228 avatar Jan 31 '24 11:01 xiao1228

Thank you for this feature, just wondering is GPTQ model going to be automatically saved as i4 as well? for example: "TheBloke/Llama-2-7b-Chat-GPTQ" which it is symmetric quantized

No, this feature will only be enabled for weight compression via NNCF.

are you planning to extend it and enable this for the GPTQ model? as it will be very helpful. Current GPTQ model has a per-tensor zero point and u4 weights, which will make sense to save as i4 as symmetric.

I created ticket 131500 to support symmetrically quantized weights in signed data type for GPTQ

l-bat avatar Feb 01 '24 14:02 l-bat

Develop

Model Backend Metric name Metric value Metric diff Num int4 Num int8 RAM MiB Compr. time Total time
tinyllama_data_aware_awq_scale_estimation OV Similarity 0.8404 -0.1596 188 124 37513 0:05:50 0:07:52
tinyllama_data_aware_awq_scale_estimation_stateful OV Similarity 0.8404 -0.1596 188 124 39704 0:05:43 0:07:10
tinyllama_data_aware_awq_stateful OV Similarity 0.85259 -0.14741 188 124 29770 0:01:39 0:03:04
tinyllama_data_aware OV Similarity 0.83853 -0.16147 188 124 30731 0:01:09 0:03:11
tinyllama_data_free OV Similarity 0.72057 -0.27943 228 84 6588 0:00:31 0:02:33
tinyllama_int8_data_free TORCH Similarity 0.95624 -0.04376 0 312 33420 0:00:05 0:02:49

l-bat:lt/wc_sym_signed

Model Backend Metric name Metric value Metric diff Num int4 Num int8 RAM MiB Compr. time Total time
tinyllama_data_aware_awq_scale_estimation OV Similarity 0.84048 -0.15952 94 124 37752 0:06:03 0:08:06
tinyllama_data_aware_awq_scale_estimation_stateful OV Similarity 0.84048 -0.15952 94 124 39638 0:05:52 0:07:19
tinyllama_data_aware_awq_stateful OV Similarity 0.85259 -0.14741 94 124 29995 0:01:47 0:03:13
tinyllama_data_aware OV Similarity 0.83853 -0.16147 94 124 30762 0:01:16 0:03:19
tinyllama_data_free OV Similarity 0.72057 -0.27943 114 84 6574 0:00:37 0:02:38
tinyllama_int8_data_free TORCH Similarity 0.95624 -0.04376 0 312 33338 0:00:05 0:02:47

l-bat avatar May 22 '24 13:05 l-bat

ci job: 23

l-bat avatar Jun 07 '24 11:06 l-bat