nncf
nncf copied to clipboard
[PTQ] Add support of arbitrary batch size for PTQ
Changes
Add a new advanced bool option for quantization - batchwise_statistics
.
When set to True then statistics collection for supported algorithms (see below) are calculated with the assumption that the 0-axis of a tensor is a batch axis.
If the value is False then statistics collection for algorithms is calculated with an assumption that the tensor has no batch axis.
If set to None statistics collection logic adapts based on the batch_size of the provided dataset.
These adjustments in statistical computation apply specifically to MinMax, ChannelAlighnment algorithms.
During the validation of proposed changes on a wide scope of models, some limitations were observed - if a model contains specific operations that output in a way that a tensor batch axis starts to contain no batch meaning anymore, then the statistics after such operations are collected not precisely.
The handling of such cases is introduced and determined by a warning message to a user with a recommendation using batch size = 1 for a specific model or set to False batchwise_statistics
option.
The torch sample for mobilenet_v2 was updated with batch_size=128
value with a new recalculated subset_size
.
The conformance test was updated with new options batch_size
and dynamic_batch_shape
.
Calibrate.py was updated with a new option batch_size
.
Algorithm support batch_size > 1:
Algorithm | Do results depend on batch_size? | Comments |
---|---|---|
MinMax | relatively depends | Relatively means that results are dependant on the correctness of the utilized assumption that batch lays on the 0-axis. To overcome there is a need to have batch axis determination algorithm |
FastBiascCorrection | Yes | Incorrect statistics calculation with no regarding batch axis in an aggregator. Need to have batch axis determination algorithm |
BiasCorrection | Yes | Incorrect statistics calculation with no regarding batch axis in an aggregator. Need to have batch axis determination algorithm |
ChannelAlighnment | No | Checked on models from conformance test: mobilenet_v2, mobilenet_v3 |
SmoothQuant | No | Checked on models from conformance test: levit_128, visformer_small |
PostTrainingQuantization | Yes | Need to have batch axis determination algorithm |
Reason for changes
Speeding up statistics collection. SpeedUp on mobilenet_v2 sample (local measurments):
Backend | bs=1 (sec) | bs=16 (sec) | bs=128 (sec) |
---|---|---|---|
Torch | 24 | 4 | 4 |
Torch CUDA | 20 | 1 | 1 |
OpenVINO | 9 | 4 | 5 |
ONNX | 17 | 11 | 12 |
Extend usage scenarios.
Related tickets
121650
Tests
Old tests were updated accordingly. New test added: test_tensor_collector_batch_size test_min_max
Codecov Report
Attention: Patch coverage is 83.61582%
with 29 lines
in your changes are missing coverage. Please review.
Project coverage is 84.89%. Comparing base (
7974023
) to head (88653aa
). Report is 2 commits behind head on develop.
Additional details and impacted files
@@ Coverage Diff @@
## develop #2197 +/- ##
===========================================
- Coverage 91.19% 84.89% -6.30%
===========================================
Files 492 494 +2
Lines 45100 45350 +250
===========================================
- Hits 41127 38502 -2625
- Misses 3973 6848 +2875
Files | Coverage Δ | |
---|---|---|
nncf/common/graph/utils.py | 82.14% <100.00%> (+1.75%) |
:arrow_up: |
nncf/common/quantization/initialization/range.py | 94.38% <100.00%> (-1.68%) |
:arrow_down: |
nncf/common/tensor_statistics/aggregator.py | 98.55% <100.00%> (-1.45%) |
:arrow_down: |
.../common/tensor_statistics/statistical_functions.py | 100.00% <100.00%> (ø) |
|
nncf/experimental/tensor/functions/numeric.py | 98.30% <100.00%> (-0.57%) |
:arrow_down: |
...ncf/experimental/tensor/functions/numpy_numeric.py | 82.35% <100.00%> (-11.73%) |
:arrow_down: |
...ncf/experimental/tensor/functions/torch_numeric.py | 98.12% <100.00%> (+0.01%) |
:arrow_up: |
nncf/onnx/graph/metatypes/groups.py | 100.00% <100.00%> (ø) |
|
nncf/onnx/graph/metatypes/onnx_metatypes.py | 99.58% <100.00%> (ø) |
|
nncf/onnx/graph/node_utils.py | 97.40% <100.00%> (-0.36%) |
:arrow_down: |
... and 30 more |
... and 45 files with indirect coverage changes
Flag | Coverage Δ | |
---|---|---|
COMMON | 44.15% <28.45%> (+0.22%) |
:arrow_up: |
ONNX | 34.65% <66.10%> (-0.01%) |
:arrow_down: |
OPENVINO | ∅ <ø> (∅) |
|
TENSORFLOW | 30.12% <20.33%> (+0.25%) |
:arrow_up: |
TORCH | 65.95% <64.97%> (+0.18%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Components | Coverage Δ | |
---|---|---|
common | 93.14% <100.00%> (-0.63%) |
:arrow_down: |
torch | 93.48% <100.00%> (-0.12%) |
:arrow_down: |
tensorflow | 93.74% <ø> (ø) |
|
onnx | 93.02% <100.00%> (-0.03%) |
:arrow_down: |
openvino | 25.75% <25.00%> (-68.33%) |
:arrow_down: |
ptq | 69.89% <73.03%> (-20.33%) |
:arrow_down: |
@kshpv , please also update the documentation accordingly
Resolved previous comments as nonactual.
conformance job with batch size = 10 - 254
@daniil-lyakhov thanks for accurate the grammatical and lexical review!
PTQ conformance bs=1 - 312 job PTQ conformance bs=10 313 job
E2E ONNX - 619 passed
LGTM. Please provide test results for weights compression.
build 35 passed