nncf icon indicating copy to clipboard operation
nncf copied to clipboard

[PTQ] Add support of arbitrary batch size for PTQ

Open kshpv opened this issue 1 year ago • 7 comments

Changes

Add a new advanced bool option for quantization - batchwise_statistics. When set to True then statistics collection for supported algorithms (see below) are calculated with the assumption that the 0-axis of a tensor is a batch axis. If the value is False then statistics collection for algorithms is calculated with an assumption that the tensor has no batch axis. If set to None statistics collection logic adapts based on the batch_size of the provided dataset.

These adjustments in statistical computation apply specifically to MinMax, ChannelAlighnment algorithms.

During the validation of proposed changes on a wide scope of models, some limitations were observed - if a model contains specific operations that output in a way that a tensor batch axis starts to contain no batch meaning anymore, then the statistics after such operations are collected not precisely.

The handling of such cases is introduced and determined by a warning message to a user with a recommendation using batch size = 1 for a specific model or set to False batchwise_statistics option.

The torch sample for mobilenet_v2 was updated with batch_size=128 value with a new recalculated subset_size. The conformance test was updated with new options batch_size and dynamic_batch_shape. Calibrate.py was updated with a new option batch_size.

Algorithm support batch_size > 1:

Algorithm Do results depend on batch_size? Comments
MinMax relatively depends Relatively means that results are dependant on the correctness of the utilized assumption that batch lays on the 0-axis. To overcome there is a need to have batch axis determination algorithm
FastBiascCorrection Yes Incorrect statistics calculation with no regarding batch axis in an aggregator. Need to have batch axis determination algorithm
BiasCorrection Yes Incorrect statistics calculation with no regarding batch axis in an aggregator. Need to have batch axis determination algorithm
ChannelAlighnment No Checked on models from conformance test: mobilenet_v2, mobilenet_v3
SmoothQuant No Checked on models from conformance test: levit_128, visformer_small
PostTrainingQuantization Yes Need to have batch axis determination algorithm

Reason for changes

Speeding up statistics collection. SpeedUp on mobilenet_v2 sample (local measurments):

Backend bs=1 (sec) bs=16 (sec) bs=128 (sec)
Torch 24 4 4
Torch CUDA 20 1 1
OpenVINO 9 4 5
ONNX 17 11 12

Extend usage scenarios.

Related tickets

121650

Tests

Old tests were updated accordingly. New test added: test_tensor_collector_batch_size test_min_max

kshpv avatar Oct 13 '23 10:10 kshpv

Codecov Report

Attention: Patch coverage is 83.61582% with 29 lines in your changes are missing coverage. Please review.

Project coverage is 84.89%. Comparing base (7974023) to head (88653aa). Report is 2 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2197      +/-   ##
===========================================
- Coverage    91.19%   84.89%   -6.30%     
===========================================
  Files          492      494       +2     
  Lines        45100    45350     +250     
===========================================
- Hits         41127    38502    -2625     
- Misses        3973     6848    +2875     
Files Coverage Δ
nncf/common/graph/utils.py 82.14% <100.00%> (+1.75%) :arrow_up:
nncf/common/quantization/initialization/range.py 94.38% <100.00%> (-1.68%) :arrow_down:
nncf/common/tensor_statistics/aggregator.py 98.55% <100.00%> (-1.45%) :arrow_down:
.../common/tensor_statistics/statistical_functions.py 100.00% <100.00%> (ø)
nncf/experimental/tensor/functions/numeric.py 98.30% <100.00%> (-0.57%) :arrow_down:
...ncf/experimental/tensor/functions/numpy_numeric.py 82.35% <100.00%> (-11.73%) :arrow_down:
...ncf/experimental/tensor/functions/torch_numeric.py 98.12% <100.00%> (+0.01%) :arrow_up:
nncf/onnx/graph/metatypes/groups.py 100.00% <100.00%> (ø)
nncf/onnx/graph/metatypes/onnx_metatypes.py 99.58% <100.00%> (ø)
nncf/onnx/graph/node_utils.py 97.40% <100.00%> (-0.36%) :arrow_down:
... and 30 more

... and 45 files with indirect coverage changes

Flag Coverage Δ
COMMON 44.15% <28.45%> (+0.22%) :arrow_up:
ONNX 34.65% <66.10%> (-0.01%) :arrow_down:
OPENVINO ∅ <ø> (∅)
TENSORFLOW 30.12% <20.33%> (+0.25%) :arrow_up:
TORCH 65.95% <64.97%> (+0.18%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
common 93.14% <100.00%> (-0.63%) :arrow_down:
torch 93.48% <100.00%> (-0.12%) :arrow_down:
tensorflow 93.74% <ø> (ø)
onnx 93.02% <100.00%> (-0.03%) :arrow_down:
openvino 25.75% <25.00%> (-68.33%) :arrow_down:
ptq 69.89% <73.03%> (-20.33%) :arrow_down:

codecov[bot] avatar Oct 13 '23 11:10 codecov[bot]

@kshpv , please also update the documentation accordingly

MaximProshin avatar Dec 21 '23 05:12 MaximProshin

Resolved previous comments as nonactual.

kshpv avatar Jan 18 '24 13:01 kshpv

conformance job with batch size = 10 - 254

kshpv avatar Jan 19 '24 10:01 kshpv

@daniil-lyakhov thanks for accurate the grammatical and lexical review!

kshpv avatar Jan 19 '24 15:01 kshpv

PTQ conformance bs=1 - 312 job PTQ conformance bs=10 313 job

kshpv avatar Mar 05 '24 15:03 kshpv

E2E ONNX - 619 passed

kshpv avatar Mar 07 '24 14:03 kshpv

LGTM. Please provide test results for weights compression.

build 35 passed

kshpv avatar Mar 22 '24 06:03 kshpv