nncf [PTQ] Add support of arbitrary batch size for PTQ

Changes

Add a new advanced bool option for quantization - batchwise_statistics. When set to True then statistics collection for supported algorithms (see below) are calculated with the assumption that the 0-axis of a tensor is a batch axis. If the value is False then statistics collection for algorithms is calculated with an assumption that the tensor has no batch axis. If set to None statistics collection logic adapts based on the batch_size of the provided dataset.

These adjustments in statistical computation apply specifically to MinMax, ChannelAlighnment algorithms.

During the validation of proposed changes on a wide scope of models, some limitations were observed - if a model contains specific operations that output in a way that a tensor batch axis starts to contain no batch meaning anymore, then the statistics after such operations are collected not precisely.

The handling of such cases is introduced and determined by a warning message to a user with a recommendation using batch size = 1 for a specific model or set to False batchwise_statistics option.

The torch sample for mobilenet_v2 was updated with batch_size=128 value with a new recalculated subset_size. The conformance test was updated with new options batch_size and dynamic_batch_shape. Calibrate.py was updated with a new option batch_size.

Algorithm support batch_size > 1:

Algorithm	Do results depend on batch_size?	Comments
MinMax	relatively depends	Relatively means that results are dependant on the correctness of the utilized assumption that batch lays on the 0-axis. To overcome there is a need to have batch axis determination algorithm
FastBiascCorrection	Yes	Incorrect statistics calculation with no regarding batch axis in an aggregator. Need to have batch axis determination algorithm
BiasCorrection	Yes	Incorrect statistics calculation with no regarding batch axis in an aggregator. Need to have batch axis determination algorithm
ChannelAlighnment	No	Checked on models from conformance test: mobilenet_v2, mobilenet_v3
SmoothQuant	No	Checked on models from conformance test: levit_128, visformer_small
PostTrainingQuantization	Yes	Need to have batch axis determination algorithm

Reason for changes

Speeding up statistics collection. SpeedUp on mobilenet_v2 sample (local measurments):

Backend	bs=1 (sec)	bs=16 (sec)	bs=128 (sec)
Torch	24	4	4
Torch CUDA	20	1	1
OpenVINO	9	4	5
ONNX	17	11	12

Extend usage scenarios.

Related tickets

121650

Tests

Old tests were updated accordingly. New test added: test_tensor_collector_batch_size test_min_max

Oct 13 '23 10:10 kshpv

Codecov Report

Attention: Patch coverage is 83.61582% with 29 lines in your changes are missing coverage. Please review.

Project coverage is 84.89%. Comparing base (7974023) to head (88653aa). Report is 2 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #2197      +/-   ##
===========================================
- Coverage    91.19%   84.89%   -6.30%     
===========================================
  Files          492      494       +2     
  Lines        45100    45350     +250     
===========================================
- Hits         41127    38502    -2625     
- Misses        3973     6848    +2875

Files	Coverage Δ
nncf/common/graph/utils.py	`82.14% <100.00%> (+1.75%)`	:arrow_up:
nncf/common/quantization/initialization/range.py	`94.38% <100.00%> (-1.68%)`	:arrow_down:
nncf/common/tensor_statistics/aggregator.py	`98.55% <100.00%> (-1.45%)`	:arrow_down:
.../common/tensor_statistics/statistical_functions.py	`100.00% <100.00%> (ø)`
nncf/experimental/tensor/functions/numeric.py	`98.30% <100.00%> (-0.57%)`	:arrow_down:
...ncf/experimental/tensor/functions/numpy_numeric.py	`82.35% <100.00%> (-11.73%)`	:arrow_down:
...ncf/experimental/tensor/functions/torch_numeric.py	`98.12% <100.00%> (+0.01%)`	:arrow_up:
nncf/onnx/graph/metatypes/groups.py	`100.00% <100.00%> (ø)`
nncf/onnx/graph/metatypes/onnx_metatypes.py	`99.58% <100.00%> (ø)`
nncf/onnx/graph/node_utils.py	`97.40% <100.00%> (-0.36%)`	:arrow_down:
... and 30 more

... and 45 files with indirect coverage changes

Flag	Coverage Δ
COMMON	`44.15% <28.45%> (+0.22%)`	:arrow_up:
ONNX	`34.65% <66.10%> (-0.01%)`	:arrow_down:
OPENVINO	`∅ <ø> (∅)`
TENSORFLOW	`30.12% <20.33%> (+0.25%)`	:arrow_up:
TORCH	`65.95% <64.97%> (+0.18%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
common	`93.14% <100.00%> (-0.63%)`	:arrow_down:
torch	`93.48% <100.00%> (-0.12%)`	:arrow_down:
tensorflow	`93.74% <ø> (ø)`
onnx	`93.02% <100.00%> (-0.03%)`	:arrow_down:
openvino	`25.75% <25.00%> (-68.33%)`	:arrow_down:
ptq	`69.89% <73.03%> (-20.33%)`	:arrow_down:

Oct 13 '23 11:10 codecov[bot]

@kshpv , please also update the documentation accordingly

Dec 21 '23 05:12 MaximProshin

Resolved previous comments as nonactual.

Jan 18 '24 13:01 kshpv

conformance job with batch size = 10 - 254

Jan 19 '24 10:01 kshpv

@daniil-lyakhov thanks for accurate the grammatical and lexical review!

Jan 19 '24 15:01 kshpv

PTQ conformance bs=1 - 312 job PTQ conformance bs=10 313 job

Mar 05 '24 15:03 kshpv

E2E ONNX - 619 passed

Mar 07 '24 14:03 kshpv

LGTM. Please provide test results for weights compression.

build 35 passed

Mar 22 '24 06:03 kshpv

nncf nncf copied to clipboard

[PTQ] Add support of arbitrary batch size for PTQ

Changes

Reason for changes

Related tickets

Tests

Codecov Report

nncf
nncf copied to clipboard