nncf Represent symmetrically quantized weights in signed data type

Changes

Represent symmetrically quantized weights in signed data type with no zero point

Reason for changes

To detect the quantization type without analyzing zero-point values
Signed data type for symmetrically quantized weights will lead to a smaller footprint, especially in case of grouped quantization.

Related tickets

130625

Tests

Updated: tests/torch/ptq/test_weights_compression.py and tests/openvino/native/quantization/test_weights_compression.py

Merge after: https://github.com/openvinotoolkit/openvino/pull/24457

Jan 29 '24 12:01 l-bat

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 91.18%. Comparing base (d06b174) to head (3e6c649). Report is 2 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff              @@
##           develop    #2434       +/-   ##
============================================
+ Coverage    47.70%   91.18%   +43.47%     
============================================
  Files          483      483               
  Lines        46305    46363       +58     
============================================
+ Hits         22090    42274    +20184     
+ Misses       24215     4089    -20126

Files	Coverage Δ
nncf/parameters.py	`100.00% <ø> (ø)`
...ization/algorithms/weight_compression/algorithm.py	`97.68% <ø> (+1.38%)`	:arrow_up:
...quantization/algorithms/weight_compression/gptq.py	`94.87% <100.00%> (-0.07%)`	:arrow_down:
...n/algorithms/weight_compression/mixed_precision.py	`98.11% <100.00%> (+0.01%)`	:arrow_up:
.../algorithms/weight_compression/openvino_backend.py	`98.80% <100.00%> (+0.10%)`	:arrow_up:
.../algorithms/weight_compression/scale_estimation.py	`92.52% <100.00%> (+0.75%)`	:arrow_up:
...ion/algorithms/weight_compression/torch_backend.py	`84.71% <100.00%> (+84.71%)`	:arrow_up:
...n/algorithms/weight_compression/weight_lowering.py	`95.13% <100.00%> (+0.21%)`	:arrow_up:
nncf/quantization/quantize_model.py	`80.55% <ø> (+11.80%)`	:arrow_up:
nncf/torch/quantization/layers.py	`95.97% <100.00%> (+57.64%)`	:arrow_up:
... and 1 more

... and 296 files with indirect coverage changes

Flag	Coverage Δ
COMMON	`42.02% <0.00%> (-0.13%)`	:arrow_down:
ONNX	`34.19% <7.20%> (-0.03%)`	:arrow_down:
OPENVINO	`40.85% <84.00%> (+0.03%)`	:arrow_up:
TENSORFLOW	`29.42% <0.00%> (?)`
TORCH	`65.42% <41.60%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
common	`93.55% <ø> (+24.22%)`	:arrow_up:
torch	`93.65% <100.00%> (+60.59%)`	:arrow_up:
tensorflow	`93.26% <ø> (+93.26%)`	:arrow_up:
onnx	`93.06% <ø> (ø)`
openvino	`94.47% <100.00%> (+0.02%)`	:arrow_up:
ptq	`90.44% <100.00%> (+9.47%)`	:arrow_up:

Jan 29 '24 12:01 codecov[bot]

Thank you for this feature, just wondering is GPTQ model going to be automatically saved as i4 as well? for example: "TheBloke/Llama-2-7b-Chat-GPTQ" which it is symmetric quantized

Jan 30 '24 14:01 xiao1228

Thank you for this feature, just wondering is GPTQ model going to be automatically saved as i4 as well? for example: "TheBloke/Llama-2-7b-Chat-GPTQ" which it is symmetric quantized

No, this feature will only be enabled for weight compression via NNCF.

Jan 31 '24 10:01 l-bat

Thank you for this feature, just wondering is GPTQ model going to be automatically saved as i4 as well? for example: "TheBloke/Llama-2-7b-Chat-GPTQ" which it is symmetric quantized

No, this feature will only be enabled for weight compression via NNCF.

are you planning to extend it and enable this for the GPTQ model? as it will be very helpful. Current GPTQ model has a per-tensor zero point and u4 weights, which will make sense to save as i4 as symmetric.

Jan 31 '24 11:01 xiao1228

Thank you for this feature, just wondering is GPTQ model going to be automatically saved as i4 as well? for example: "TheBloke/Llama-2-7b-Chat-GPTQ" which it is symmetric quantized

No, this feature will only be enabled for weight compression via NNCF.

are you planning to extend it and enable this for the GPTQ model? as it will be very helpful. Current GPTQ model has a per-tensor zero point and u4 weights, which will make sense to save as i4 as symmetric.

I created ticket 131500 to support symmetrically quantized weights in signed data type for GPTQ

Feb 01 '24 14:02 l-bat

Develop

Model	Backend	Metric name	Metric value	Metric diff	Num int4	Num int8	RAM MiB	Compr. time	Total time
tinyllama_data_aware_awq_scale_estimation	OV	Similarity	0.8404	-0.1596	188	124	37513	0:05:50	0:07:52
tinyllama_data_aware_awq_scale_estimation_stateful	OV	Similarity	0.8404	-0.1596	188	124	39704	0:05:43	0:07:10
tinyllama_data_aware_awq_stateful	OV	Similarity	0.85259	-0.14741	188	124	29770	0:01:39	0:03:04
tinyllama_data_aware	OV	Similarity	0.83853	-0.16147	188	124	30731	0:01:09	0:03:11
tinyllama_data_free	OV	Similarity	0.72057	-0.27943	228	84	6588	0:00:31	0:02:33
tinyllama_int8_data_free	TORCH	Similarity	0.95624	-0.04376	0	312	33420	0:00:05	0:02:49

l-bat:lt/wc_sym_signed

Model	Backend	Metric name	Metric value	Metric diff	Num int4	Num int8	RAM MiB	Compr. time	Total time
tinyllama_data_aware_awq_scale_estimation	OV	Similarity	0.84048	-0.15952	94	124	37752	0:06:03	0:08:06
tinyllama_data_aware_awq_scale_estimation_stateful	OV	Similarity	0.84048	-0.15952	94	124	39638	0:05:52	0:07:19
tinyllama_data_aware_awq_stateful	OV	Similarity	0.85259	-0.14741	94	124	29995	0:01:47	0:03:13
tinyllama_data_aware	OV	Similarity	0.83853	-0.16147	94	124	30762	0:01:16	0:03:19
tinyllama_data_free	OV	Similarity	0.72057	-0.27943	114	84	6574	0:00:37	0:02:38
tinyllama_int8_data_free	TORCH	Similarity	0.95624	-0.04376	0	312	33338	0:00:05	0:02:47

May 22 '24 13:05 l-bat

ci job: 23

Jun 07 '24 11:06 l-bat

nncf nncf copied to clipboard

Represent symmetrically quantized weights in signed data type

Changes

Reason for changes

Related tickets

Tests

Codecov Report

nncf
nncf copied to clipboard