nncf [PTQ][OV] BF16 support

Changes

Added BF16 type support
Added FQ parameters generation based on type
Extended the list of the supported types for OpenVINO input data with ov.Tensor

Reason for changes

BF16 support

Related tickets

126782

Tests

Updated existing tests with BF16.
Edit: not updated due to lack of precision support in the OpenVINO

Dec 07 '23 20:12 nikita-malininn

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 91.20%. Comparing base (f4bd077) to head (9471ac8).

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #2307   +/-   ##
========================================
  Coverage    91.19%   91.20%           
========================================
  Files          483      483           
  Lines        46443    46435    -8     
========================================
- Hits         42355    42351    -4     
+ Misses        4088     4084    -4

Files	Coverage Δ
nncf/openvino/engine.py	`96.55% <ø> (ø)`
nncf/openvino/graph/model_transformer.py	`94.84% <100.00%> (+0.82%)`	:arrow_up:
nncf/openvino/graph/node_utils.py	`98.80% <100.00%> (-0.02%)`	:arrow_down:
nncf/openvino/graph/transformations/commands.py	`97.67% <100.00%> (+0.11%)`	:arrow_up:
nncf/openvino/quantization/quantize_ifmodel.py	`100.00% <100.00%> (ø)`
.../algorithms/weight_compression/openvino_backend.py	`98.84% <100.00%> (-0.01%)`	:arrow_down:

Flag	Coverage Δ
COMMON	`41.93% <0.00%> (+<0.01%)`	:arrow_up:
ONNX	`34.19% <0.00%> (+<0.01%)`	:arrow_up:
OPENVINO	`40.98% <100.00%> (-0.01%)`	:arrow_down:
TENSORFLOW	`29.39% <0.00%> (+<0.01%)`	:arrow_up:
TORCH	`65.11% <7.54%> (+0.01%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
common	`93.54% <ø> (ø)`
torch	`93.65% <ø> (ø)`
tensorflow	`93.26% <ø> (ø)`
onnx	`93.06% <ø> (ø)`
openvino	`94.62% <100.00%> (+0.10%)`	:arrow_up:
ptq	`90.50% <100.00%> (-0.01%)`	:arrow_down:

Dec 07 '23 20:12 codecov[bot]

@alexsu52, @l-bat, please, review.

Dec 12 '23 18:12 nikita-malininn

@l-bat, @alexsu52, @andrey-churkin, review, please.

Jun 17 '24 15:06 nikita-malininn

Compression time comparison. Numbers were collected on local i9-10980XE.

Model	Backend	Compr. Time (develop)	Compr. Time (bf16 branch)
hf/bert-base-uncased	OV	00:00:15	00:00:15
timm/crossvit_9_240	OV	00:00:20	00:00:20
timm/darknet53	OV	00:00:44	00:00:32
timm/deit3_small_patch16_224	OV	00:00:25	00:00:25
timm/dla34	OV	00:00:15	00:00:15
timm/dpn68	OV	00:00:16	00:00:16
timm/efficientnet_b0_BC	OV	00:00:20	00:00:20
timm/efficientnet_b0	OV	00:00:12	00:00:12
timm/efficientnet_lite0	OV	00:00:11	00:00:11
timm/hrnet_w18	OV	00:01:20	00:01:11
timm/inception_resnet_v2	OV	00:01:23	00:01:13
timm/levit_128	OV	00:00:18	00:00:18
timm/mobilenetv2_050_BC	OV	00:00:14	00:00:14
timm/mobilenetv2_050	OV	00:00:09	00:00:10
timm/mobilenetv3_small_050_BC	OV	00:00:09	00:00:09
timm/mobilenetv3_small_050	OV	00:00:07	00:00:06
timm/regnetx_002	OV	00:00:09	00:00:08
timm/resnest14d	OV	00:00:16	00:00:15
timm/resnet18	OV	00:00:10	00:00:10
timm/swin_base_patch4_window7_224	OV	00:01:18	00:01:18
timm/tf_inception_v3	OV	00:00:31	00:00:31
timm/vgg11	OV	00:00:31	00:00:30
timm/visformer_small	OV	00:00:18	00:00:17
timm/wide_resnet50_2	OV	00:00:39	00:00:39

No degradations were observed.

Jun 19 '24 11:06 nikita-malininn

I would suggest to check accuracy and performance of weight compression algorithms for FP32 and FP16 precision.

Here are the results of weight compression validation on the local machine - i9-10980XE:

Model	Backend	Metric name	Metric value (develop)	Metric value (bf16 branch)	Compr. Time (develop)	Compr. Time (bf16 branch)
tinyllama_data_aware	OV	Similarity	0,83853	0,83853	00:01:26	00:01:26
tinyllama_data_free	OV	Similarity	0,72057	0,72057	00:00:44	00:00:44

Also, here are the numbers from the examples/llm_compression/openvino/tiny_llama. Develop: develop_weight_compression BF16 branch: bf16_weight_compression

There were no degradations observed. @alexsu52, how did you reproduce the issue?

Jun 25 '24 07:06 nikita-malininn

I would suggest to check accuracy and performance of weight compression algorithms for FP32 and FP16 precision.

Here are the results of weight compression validation on the local machine - i9-10980XE: Model Backend Metric name Metric value (develop) Metric value (bf16 branch) Compr. Time (develop) Compr. Time (bf16 branch) tinyllama_data_aware OV Similarity 0,83853 0,83853 00:01:26 00:01:26 tinyllama_data_free OV Similarity 0,72057 0,72057 00:00:44 00:00:44

Also, here are the numbers from the examples/llm_compression/openvino/tiny_llama. Develop: BF16 branch:

There were no degradations observed. @alexsu52, how did you reproduce the issue?

I used model with FP16 weights.

Jun 25 '24 09:06 alexsu52

@andrey-churkin, @l-bat, @kshpv, @alexsu52, @daniil-lyakhov, @andreyanufr, review, please.

Jul 10 '24 10:07 nikita-malininn

FP16 model as input (bloomz-560m). Develop: Branch:

Jul 12 '24 12:07 nikita-malininn