nncf
nncf copied to clipboard
Represent symmetrically quantized weights in signed data type
Changes
Represent symmetrically quantized weights in signed data type with no zero point
Reason for changes
- To detect the quantization type without analyzing zero-point values
- Signed data type for symmetrically quantized weights will lead to a smaller footprint, especially in case of grouped quantization.
Related tickets
130625
Tests
Updated: tests/torch/ptq/test_weights_compression.py
and tests/openvino/native/quantization/test_weights_compression.py
Merge after: https://github.com/openvinotoolkit/openvino/pull/24457
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 91.18%. Comparing base (
d06b174
) to head (3e6c649
). Report is 2 commits behind head on develop.
Additional details and impacted files
@@ Coverage Diff @@
## develop #2434 +/- ##
============================================
+ Coverage 47.70% 91.18% +43.47%
============================================
Files 483 483
Lines 46305 46363 +58
============================================
+ Hits 22090 42274 +20184
+ Misses 24215 4089 -20126
Files | Coverage Δ | |
---|---|---|
nncf/parameters.py | 100.00% <ø> (ø) |
|
...ization/algorithms/weight_compression/algorithm.py | 97.68% <ø> (+1.38%) |
:arrow_up: |
...quantization/algorithms/weight_compression/gptq.py | 94.87% <100.00%> (-0.07%) |
:arrow_down: |
...n/algorithms/weight_compression/mixed_precision.py | 98.11% <100.00%> (+0.01%) |
:arrow_up: |
.../algorithms/weight_compression/openvino_backend.py | 98.80% <100.00%> (+0.10%) |
:arrow_up: |
.../algorithms/weight_compression/scale_estimation.py | 92.52% <100.00%> (+0.75%) |
:arrow_up: |
...ion/algorithms/weight_compression/torch_backend.py | 84.71% <100.00%> (+84.71%) |
:arrow_up: |
...n/algorithms/weight_compression/weight_lowering.py | 95.13% <100.00%> (+0.21%) |
:arrow_up: |
nncf/quantization/quantize_model.py | 80.55% <ø> (+11.80%) |
:arrow_up: |
nncf/torch/quantization/layers.py | 95.97% <100.00%> (+57.64%) |
:arrow_up: |
... and 1 more |
... and 296 files with indirect coverage changes
Flag | Coverage Δ | |
---|---|---|
COMMON | 42.02% <0.00%> (-0.13%) |
:arrow_down: |
ONNX | 34.19% <7.20%> (-0.03%) |
:arrow_down: |
OPENVINO | 40.85% <84.00%> (+0.03%) |
:arrow_up: |
TENSORFLOW | 29.42% <0.00%> (?) |
|
TORCH | 65.42% <41.60%> (?) |
Flags with carried forward coverage won't be shown. Click here to find out more.
Components | Coverage Δ | |
---|---|---|
common | 93.55% <ø> (+24.22%) |
:arrow_up: |
torch | 93.65% <100.00%> (+60.59%) |
:arrow_up: |
tensorflow | 93.26% <ø> (+93.26%) |
:arrow_up: |
onnx | 93.06% <ø> (ø) |
|
openvino | 94.47% <100.00%> (+0.02%) |
:arrow_up: |
ptq | 90.44% <100.00%> (+9.47%) |
:arrow_up: |
Thank you for this feature, just wondering is GPTQ model going to be automatically saved as i4 as well?
for example: "TheBloke/Llama-2-7b-Chat-GPTQ"
which it is symmetric quantized
Thank you for this feature, just wondering is GPTQ model going to be automatically saved as i4 as well? for example:
"TheBloke/Llama-2-7b-Chat-GPTQ"
which it is symmetric quantized
No, this feature will only be enabled for weight compression via NNCF.
Thank you for this feature, just wondering is GPTQ model going to be automatically saved as i4 as well? for example:
"TheBloke/Llama-2-7b-Chat-GPTQ"
which it is symmetric quantizedNo, this feature will only be enabled for weight compression via NNCF.
are you planning to extend it and enable this for the GPTQ model? as it will be very helpful. Current GPTQ model has a per-tensor zero point and u4 weights, which will make sense to save as i4 as symmetric.
Thank you for this feature, just wondering is GPTQ model going to be automatically saved as i4 as well? for example:
"TheBloke/Llama-2-7b-Chat-GPTQ"
which it is symmetric quantizedNo, this feature will only be enabled for weight compression via NNCF.
are you planning to extend it and enable this for the GPTQ model? as it will be very helpful. Current GPTQ model has a per-tensor zero point and u4 weights, which will make sense to save as i4 as symmetric.
I created ticket 131500 to support symmetrically quantized weights in signed data type for GPTQ
Develop
Model | Backend | Metric name | Metric value | Metric diff | Num int4 | Num int8 | RAM MiB | Compr. time | Total time |
---|---|---|---|---|---|---|---|---|---|
tinyllama_data_aware_awq_scale_estimation | OV | Similarity | 0.8404 | -0.1596 | 188 | 124 | 37513 | 0:05:50 | 0:07:52 |
tinyllama_data_aware_awq_scale_estimation_stateful | OV | Similarity | 0.8404 | -0.1596 | 188 | 124 | 39704 | 0:05:43 | 0:07:10 |
tinyllama_data_aware_awq_stateful | OV | Similarity | 0.85259 | -0.14741 | 188 | 124 | 29770 | 0:01:39 | 0:03:04 |
tinyllama_data_aware | OV | Similarity | 0.83853 | -0.16147 | 188 | 124 | 30731 | 0:01:09 | 0:03:11 |
tinyllama_data_free | OV | Similarity | 0.72057 | -0.27943 | 228 | 84 | 6588 | 0:00:31 | 0:02:33 |
tinyllama_int8_data_free | TORCH | Similarity | 0.95624 | -0.04376 | 0 | 312 | 33420 | 0:00:05 | 0:02:49 |
l-bat:lt/wc_sym_signed
Model | Backend | Metric name | Metric value | Metric diff | Num int4 | Num int8 | RAM MiB | Compr. time | Total time |
---|---|---|---|---|---|---|---|---|---|
tinyllama_data_aware_awq_scale_estimation | OV | Similarity | 0.84048 | -0.15952 | 94 | 124 | 37752 | 0:06:03 | 0:08:06 |
tinyllama_data_aware_awq_scale_estimation_stateful | OV | Similarity | 0.84048 | -0.15952 | 94 | 124 | 39638 | 0:05:52 | 0:07:19 |
tinyllama_data_aware_awq_stateful | OV | Similarity | 0.85259 | -0.14741 | 94 | 124 | 29995 | 0:01:47 | 0:03:13 |
tinyllama_data_aware | OV | Similarity | 0.83853 | -0.16147 | 94 | 124 | 30762 | 0:01:16 | 0:03:19 |
tinyllama_data_free | OV | Similarity | 0.72057 | -0.27943 | 114 | 84 | 6574 | 0:00:37 | 0:02:38 |
tinyllama_int8_data_free | TORCH | Similarity | 0.95624 | -0.04376 | 0 | 312 | 33338 | 0:00:05 | 0:02:47 |
ci job: 23