[PTQ][OV] BF16 support
Changes
- Added BF16 type support
- Added FQ parameters generation based on type
- Extended the list of the supported types for OpenVINO input data with
ov.Tensor
Reason for changes
- BF16 support
Related tickets
- 126782
Tests
- Updated existing tests with BF16.
- Edit: not updated due to lack of precision support in the OpenVINO
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 91.20%. Comparing base (
f4bd077) to head (9471ac8).
Additional details and impacted files
@@ Coverage Diff @@
## develop #2307 +/- ##
========================================
Coverage 91.19% 91.20%
========================================
Files 483 483
Lines 46443 46435 -8
========================================
- Hits 42355 42351 -4
+ Misses 4088 4084 -4
| Files | Coverage Δ | |
|---|---|---|
| nncf/openvino/engine.py | 96.55% <ø> (ø) |
|
| nncf/openvino/graph/model_transformer.py | 94.84% <100.00%> (+0.82%) |
:arrow_up: |
| nncf/openvino/graph/node_utils.py | 98.80% <100.00%> (-0.02%) |
:arrow_down: |
| nncf/openvino/graph/transformations/commands.py | 97.67% <100.00%> (+0.11%) |
:arrow_up: |
| nncf/openvino/quantization/quantize_ifmodel.py | 100.00% <100.00%> (ø) |
|
| .../algorithms/weight_compression/openvino_backend.py | 98.84% <100.00%> (-0.01%) |
:arrow_down: |
| Flag | Coverage Δ | |
|---|---|---|
| COMMON | 41.93% <0.00%> (+<0.01%) |
:arrow_up: |
| ONNX | 34.19% <0.00%> (+<0.01%) |
:arrow_up: |
| OPENVINO | 40.98% <100.00%> (-0.01%) |
:arrow_down: |
| TENSORFLOW | 29.39% <0.00%> (+<0.01%) |
:arrow_up: |
| TORCH | 65.11% <7.54%> (+0.01%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Components | Coverage Δ | |
|---|---|---|
| common | 93.54% <ø> (ø) |
|
| torch | 93.65% <ø> (ø) |
|
| tensorflow | 93.26% <ø> (ø) |
|
| onnx | 93.06% <ø> (ø) |
|
| openvino | 94.62% <100.00%> (+0.10%) |
:arrow_up: |
| ptq | 90.50% <100.00%> (-0.01%) |
:arrow_down: |
@alexsu52, @l-bat, please, review.
@l-bat, @alexsu52, @andrey-churkin, review, please.
Compression time comparison. Numbers were collected on local i9-10980XE.
| Model | Backend | Compr. Time (develop) | Compr. Time (bf16 branch) |
|---|---|---|---|
| hf/bert-base-uncased | OV | 00:00:15 | 00:00:15 |
| timm/crossvit_9_240 | OV | 00:00:20 | 00:00:20 |
| timm/darknet53 | OV | 00:00:44 | 00:00:32 |
| timm/deit3_small_patch16_224 | OV | 00:00:25 | 00:00:25 |
| timm/dla34 | OV | 00:00:15 | 00:00:15 |
| timm/dpn68 | OV | 00:00:16 | 00:00:16 |
| timm/efficientnet_b0_BC | OV | 00:00:20 | 00:00:20 |
| timm/efficientnet_b0 | OV | 00:00:12 | 00:00:12 |
| timm/efficientnet_lite0 | OV | 00:00:11 | 00:00:11 |
| timm/hrnet_w18 | OV | 00:01:20 | 00:01:11 |
| timm/inception_resnet_v2 | OV | 00:01:23 | 00:01:13 |
| timm/levit_128 | OV | 00:00:18 | 00:00:18 |
| timm/mobilenetv2_050_BC | OV | 00:00:14 | 00:00:14 |
| timm/mobilenetv2_050 | OV | 00:00:09 | 00:00:10 |
| timm/mobilenetv3_small_050_BC | OV | 00:00:09 | 00:00:09 |
| timm/mobilenetv3_small_050 | OV | 00:00:07 | 00:00:06 |
| timm/regnetx_002 | OV | 00:00:09 | 00:00:08 |
| timm/resnest14d | OV | 00:00:16 | 00:00:15 |
| timm/resnet18 | OV | 00:00:10 | 00:00:10 |
| timm/swin_base_patch4_window7_224 | OV | 00:01:18 | 00:01:18 |
| timm/tf_inception_v3 | OV | 00:00:31 | 00:00:31 |
| timm/vgg11 | OV | 00:00:31 | 00:00:30 |
| timm/visformer_small | OV | 00:00:18 | 00:00:17 |
| timm/wide_resnet50_2 | OV | 00:00:39 | 00:00:39 |
No degradations were observed.
I would suggest to check accuracy and performance of weight compression algorithms for FP32 and FP16 precision.
Here are the results of weight compression validation on the local machine - i9-10980XE:
| Model | Backend | Metric name | Metric value (develop) | Metric value (bf16 branch) | Compr. Time (develop) | Compr. Time (bf16 branch) |
|---|---|---|---|---|---|---|
| tinyllama_data_aware | OV | Similarity | 0,83853 | 0,83853 | 00:01:26 | 00:01:26 |
| tinyllama_data_free | OV | Similarity | 0,72057 | 0,72057 | 00:00:44 | 00:00:44 |
Also, here are the numbers from the examples/llm_compression/openvino/tiny_llama. Develop:
BF16 branch:
There were no degradations observed. @alexsu52, how did you reproduce the issue?
I would suggest to check accuracy and performance of weight compression algorithms for FP32 and FP16 precision.
Here are the results of weight compression validation on the local machine - i9-10980XE: Model Backend Metric name Metric value (develop) Metric value (bf16 branch) Compr. Time (develop) Compr. Time (bf16 branch) tinyllama_data_aware OV Similarity 0,83853 0,83853 00:01:26 00:01:26 tinyllama_data_free OV Similarity 0,72057 0,72057 00:00:44 00:00:44
Also, here are the numbers from the examples/llm_compression/openvino/tiny_llama. Develop:
BF16 branch:
There were no degradations observed. @alexsu52, how did you reproduce the issue?
I used model with FP16 weights.
@andrey-churkin, @l-bat, @kshpv, @alexsu52, @daniil-lyakhov, @andreyanufr, review, please.
FP16 model as input (bloomz-560m). Develop:
Branch:
BF16 branch: 