nncf icon indicating copy to clipboard operation
nncf copied to clipboard

[PTQ][OV] BF16 support

Open KodiaqQ opened this issue 1 year ago • 6 comments

Changes

  • Added BF16 type support
  • Added FQ parameters generation based on type
  • Extended the list of the supported types for OpenVINO input data with ov.Tensor

Reason for changes

  • BF16 support

Related tickets

  • 126782

Tests

  • Updated existing tests with BF16.
  • Edit: not updated due to lack of precision support in the OpenVINO

KodiaqQ avatar Dec 07 '23 20:12 KodiaqQ

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 91.20%. Comparing base (f4bd077) to head (9471ac8).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #2307   +/-   ##
========================================
  Coverage    91.19%   91.20%           
========================================
  Files          483      483           
  Lines        46443    46435    -8     
========================================
- Hits         42355    42351    -4     
+ Misses        4088     4084    -4     
Files Coverage Δ
nncf/openvino/engine.py 96.55% <ø> (ø)
nncf/openvino/graph/model_transformer.py 94.84% <100.00%> (+0.82%) :arrow_up:
nncf/openvino/graph/node_utils.py 98.80% <100.00%> (-0.02%) :arrow_down:
nncf/openvino/graph/transformations/commands.py 97.67% <100.00%> (+0.11%) :arrow_up:
nncf/openvino/quantization/quantize_ifmodel.py 100.00% <100.00%> (ø)
.../algorithms/weight_compression/openvino_backend.py 98.84% <100.00%> (-0.01%) :arrow_down:
Flag Coverage Δ
COMMON 41.93% <0.00%> (+<0.01%) :arrow_up:
ONNX 34.19% <0.00%> (+<0.01%) :arrow_up:
OPENVINO 40.98% <100.00%> (-0.01%) :arrow_down:
TENSORFLOW 29.39% <0.00%> (+<0.01%) :arrow_up:
TORCH 65.11% <7.54%> (+0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
common 93.54% <ø> (ø)
torch 93.65% <ø> (ø)
tensorflow 93.26% <ø> (ø)
onnx 93.06% <ø> (ø)
openvino 94.62% <100.00%> (+0.10%) :arrow_up:
ptq 90.50% <100.00%> (-0.01%) :arrow_down:

codecov[bot] avatar Dec 07 '23 20:12 codecov[bot]

@alexsu52, @l-bat, please, review.

KodiaqQ avatar Dec 12 '23 18:12 KodiaqQ

@l-bat, @alexsu52, @andrey-churkin, review, please.

KodiaqQ avatar Jun 17 '24 15:06 KodiaqQ

Compression time comparison. Numbers were collected on local i9-10980XE.

Model Backend Compr. Time (develop) Compr. Time (bf16 branch)
hf/bert-base-uncased OV 00:00:15 00:00:15
timm/crossvit_9_240 OV 00:00:20 00:00:20
timm/darknet53 OV 00:00:44 00:00:32
timm/deit3_small_patch16_224 OV 00:00:25 00:00:25
timm/dla34 OV 00:00:15 00:00:15
timm/dpn68 OV 00:00:16 00:00:16
timm/efficientnet_b0_BC OV 00:00:20 00:00:20
timm/efficientnet_b0 OV 00:00:12 00:00:12
timm/efficientnet_lite0 OV 00:00:11 00:00:11
timm/hrnet_w18 OV 00:01:20 00:01:11
timm/inception_resnet_v2 OV 00:01:23 00:01:13
timm/levit_128 OV 00:00:18 00:00:18
timm/mobilenetv2_050_BC OV 00:00:14 00:00:14
timm/mobilenetv2_050 OV 00:00:09 00:00:10
timm/mobilenetv3_small_050_BC OV 00:00:09 00:00:09
timm/mobilenetv3_small_050 OV 00:00:07 00:00:06
timm/regnetx_002 OV 00:00:09 00:00:08
timm/resnest14d OV 00:00:16 00:00:15
timm/resnet18 OV 00:00:10 00:00:10
timm/swin_base_patch4_window7_224 OV 00:01:18 00:01:18
timm/tf_inception_v3 OV 00:00:31 00:00:31
timm/vgg11 OV 00:00:31 00:00:30
timm/visformer_small OV 00:00:18 00:00:17
timm/wide_resnet50_2 OV 00:00:39 00:00:39

No degradations were observed.

KodiaqQ avatar Jun 19 '24 11:06 KodiaqQ

I would suggest to check accuracy and performance of weight compression algorithms for FP32 and FP16 precision.

Here are the results of weight compression validation on the local machine - i9-10980XE:

Model Backend Metric name Metric value (develop) Metric value (bf16 branch) Compr. Time (develop) Compr. Time (bf16 branch)
tinyllama_data_aware OV Similarity 0,83853 0,83853 00:01:26 00:01:26
tinyllama_data_free OV Similarity 0,72057 0,72057 00:00:44 00:00:44

Also, here are the numbers from the examples/llm_compression/openvino/tiny_llama. Develop: develop_weight_compression BF16 branch: bf16_weight_compression

There were no degradations observed. @alexsu52, how did you reproduce the issue?

KodiaqQ avatar Jun 25 '24 07:06 KodiaqQ

I would suggest to check accuracy and performance of weight compression algorithms for FP32 and FP16 precision.

Here are the results of weight compression validation on the local machine - i9-10980XE: Model Backend Metric name Metric value (develop) Metric value (bf16 branch) Compr. Time (develop) Compr. Time (bf16 branch) tinyllama_data_aware OV Similarity 0,83853 0,83853 00:01:26 00:01:26 tinyllama_data_free OV Similarity 0,72057 0,72057 00:00:44 00:00:44

Also, here are the numbers from the examples/llm_compression/openvino/tiny_llama. Develop: develop_weight_compression BF16 branch: bf16_weight_compression

There were no degradations observed. @alexsu52, how did you reproduce the issue?

I used model with FP16 weights.

alexsu52 avatar Jun 25 '24 09:06 alexsu52

@andrey-churkin, @l-bat, @kshpv, @alexsu52, @daniil-lyakhov, @andreyanufr, review, please.

KodiaqQ avatar Jul 10 '24 10:07 KodiaqQ

FP16 model as input (bloomz-560m). Develop: image Branch: image

KodiaqQ avatar Jul 12 '24 12:07 KodiaqQ