nncf
nncf copied to clipboard
[PTQ][OV] BF16 support
Changes
- Added BF16 type support
- Added FQ parameters generation based on type
- Extended the list of the supported types for OpenVINO input data with
ov.Tensor
Reason for changes
- BF16 support
Related tickets
- 126782
Tests
- Updated existing tests with BF16.
- Edit: not updated due to lack of precision support in the OpenVINO
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 91.20%. Comparing base (
f4bd077
) to head (9471ac8
).
Additional details and impacted files
@@ Coverage Diff @@
## develop #2307 +/- ##
========================================
Coverage 91.19% 91.20%
========================================
Files 483 483
Lines 46443 46435 -8
========================================
- Hits 42355 42351 -4
+ Misses 4088 4084 -4
Files | Coverage Δ | |
---|---|---|
nncf/openvino/engine.py | 96.55% <ø> (ø) |
|
nncf/openvino/graph/model_transformer.py | 94.84% <100.00%> (+0.82%) |
:arrow_up: |
nncf/openvino/graph/node_utils.py | 98.80% <100.00%> (-0.02%) |
:arrow_down: |
nncf/openvino/graph/transformations/commands.py | 97.67% <100.00%> (+0.11%) |
:arrow_up: |
nncf/openvino/quantization/quantize_ifmodel.py | 100.00% <100.00%> (ø) |
|
.../algorithms/weight_compression/openvino_backend.py | 98.84% <100.00%> (-0.01%) |
:arrow_down: |
Flag | Coverage Δ | |
---|---|---|
COMMON | 41.93% <0.00%> (+<0.01%) |
:arrow_up: |
ONNX | 34.19% <0.00%> (+<0.01%) |
:arrow_up: |
OPENVINO | 40.98% <100.00%> (-0.01%) |
:arrow_down: |
TENSORFLOW | 29.39% <0.00%> (+<0.01%) |
:arrow_up: |
TORCH | 65.11% <7.54%> (+0.01%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Components | Coverage Δ | |
---|---|---|
common | 93.54% <ø> (ø) |
|
torch | 93.65% <ø> (ø) |
|
tensorflow | 93.26% <ø> (ø) |
|
onnx | 93.06% <ø> (ø) |
|
openvino | 94.62% <100.00%> (+0.10%) |
:arrow_up: |
ptq | 90.50% <100.00%> (-0.01%) |
:arrow_down: |
@alexsu52, @l-bat, please, review.
@l-bat, @alexsu52, @andrey-churkin, review, please.
Compression time comparison. Numbers were collected on local i9-10980XE.
Model | Backend | Compr. Time (develop) | Compr. Time (bf16 branch) |
---|---|---|---|
hf/bert-base-uncased | OV | 00:00:15 | 00:00:15 |
timm/crossvit_9_240 | OV | 00:00:20 | 00:00:20 |
timm/darknet53 | OV | 00:00:44 | 00:00:32 |
timm/deit3_small_patch16_224 | OV | 00:00:25 | 00:00:25 |
timm/dla34 | OV | 00:00:15 | 00:00:15 |
timm/dpn68 | OV | 00:00:16 | 00:00:16 |
timm/efficientnet_b0_BC | OV | 00:00:20 | 00:00:20 |
timm/efficientnet_b0 | OV | 00:00:12 | 00:00:12 |
timm/efficientnet_lite0 | OV | 00:00:11 | 00:00:11 |
timm/hrnet_w18 | OV | 00:01:20 | 00:01:11 |
timm/inception_resnet_v2 | OV | 00:01:23 | 00:01:13 |
timm/levit_128 | OV | 00:00:18 | 00:00:18 |
timm/mobilenetv2_050_BC | OV | 00:00:14 | 00:00:14 |
timm/mobilenetv2_050 | OV | 00:00:09 | 00:00:10 |
timm/mobilenetv3_small_050_BC | OV | 00:00:09 | 00:00:09 |
timm/mobilenetv3_small_050 | OV | 00:00:07 | 00:00:06 |
timm/regnetx_002 | OV | 00:00:09 | 00:00:08 |
timm/resnest14d | OV | 00:00:16 | 00:00:15 |
timm/resnet18 | OV | 00:00:10 | 00:00:10 |
timm/swin_base_patch4_window7_224 | OV | 00:01:18 | 00:01:18 |
timm/tf_inception_v3 | OV | 00:00:31 | 00:00:31 |
timm/vgg11 | OV | 00:00:31 | 00:00:30 |
timm/visformer_small | OV | 00:00:18 | 00:00:17 |
timm/wide_resnet50_2 | OV | 00:00:39 | 00:00:39 |
No degradations were observed.
I would suggest to check accuracy and performance of weight compression algorithms for FP32 and FP16 precision.
Here are the results of weight compression validation on the local machine - i9-10980XE:
Model | Backend | Metric name | Metric value (develop) | Metric value (bf16 branch) | Compr. Time (develop) | Compr. Time (bf16 branch) |
---|---|---|---|---|---|---|
tinyllama_data_aware | OV | Similarity | 0,83853 | 0,83853 | 00:01:26 | 00:01:26 |
tinyllama_data_free | OV | Similarity | 0,72057 | 0,72057 | 00:00:44 | 00:00:44 |
Also, here are the numbers from the examples/llm_compression/openvino/tiny_llama. Develop:
BF16 branch:
There were no degradations observed. @alexsu52, how did you reproduce the issue?
I would suggest to check accuracy and performance of weight compression algorithms for FP32 and FP16 precision.
Here are the results of weight compression validation on the local machine - i9-10980XE: Model Backend Metric name Metric value (develop) Metric value (bf16 branch) Compr. Time (develop) Compr. Time (bf16 branch) tinyllama_data_aware OV Similarity 0,83853 0,83853 00:01:26 00:01:26 tinyllama_data_free OV Similarity 0,72057 0,72057 00:00:44 00:00:44
Also, here are the numbers from the examples/llm_compression/openvino/tiny_llama. Develop:
BF16 branch:
There were no degradations observed. @alexsu52, how did you reproduce the issue?
I used model with FP16 weights.
@andrey-churkin, @l-bat, @kshpv, @alexsu52, @daniil-lyakhov, @andreyanufr, review, please.
FP16 model as input (bloomz-560m). Develop:
Branch: