nncf icon indicating copy to clipboard operation
nncf copied to clipboard

[WC] Align compression subgraphs for both weight input data types

Open nikita-savelyevv opened this issue 11 months ago • 1 comments

Changes

When compression is applied to a model saved with FP32 weights, the resulting graph is different compared to the case when an input model is saved with FP16 weights. This PR aligns these two cases and makes compression subgraph equal for them. This subgraph is below. Weight, scale and zero point are converted to FP32. The Convert node after Multiply which is present in FP16 input case is bypassed. image

nikita-savelyevv avatar Feb 29 '24 14:02 nikita-savelyevv

Codecov Report

Attention: Patch coverage is 0% with 7 lines in your changes are missing coverage. Please review.

Project coverage is 77.93%. Comparing base (573b0c3) to head (4ce510a). Report is 1 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##           develop    #2537       +/-   ##
============================================
- Coverage    90.87%   77.93%   -12.94%     
============================================
  Files          494      494               
  Lines        45612    45416      -196     
============================================
- Hits         41449    35397     -6052     
- Misses        4163    10019     +5856     
Files Coverage Δ
.../algorithms/weight_compression/openvino_backend.py 0.00% <0.00%> (-98.34%) :arrow_down:

... and 107 files with indirect coverage changes

Flag Coverage Δ
COMMON ?
ONNX ?
OPENVINO ?
TENSORFLOW 30.10% <0.00%> (ø)
TORCH 65.96% <0.00%> (-0.01%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
common 88.28% <ø> (-5.47%) :arrow_down:
torch 93.49% <ø> (-0.01%) :arrow_down:
tensorflow 93.74% <ø> (+1.00%) :arrow_up:
onnx 0.00% <ø> (-93.09%) :arrow_down:
openvino 25.70% <0.00%> (-68.47%) :arrow_down:
ptq 53.06% <0.00%> (-37.03%) :arrow_down:

codecov[bot] avatar Feb 29 '24 14:02 codecov[bot]

WC manual test fails until #2569 is not merged.

nikita-savelyevv avatar Mar 19 '24 15:03 nikita-savelyevv

post training weight compression test build 34 is green

nikita-savelyevv avatar Mar 27 '24 10:03 nikita-savelyevv

@alexsu52 @nikita-savelyevv I've measured time for compression and total time for different weight compression cases: develop imageq current PR image

Seems like model inference takes almost twice longer on the validation dataset. Does it mean that compressed model should be saved differently in tests and on customer side? https://github.com/openvinotoolkit/nncf/blob/develop/tests/post_training/pipelines/lm_weight_compression.py#L174

ljaljushkin avatar Mar 27 '24 20:03 ljaljushkin

@ljaljushkin Thanks for highlighting this!

The reason behind this is that during compression with group size, there is an additional Reshape node. In this PR, a Convert f16>f32 node is added after scale Multiply node. If Convert is added before Reshape node, then the performance drops. To fix this, I moved Convert node after Reshape node.

Before After
image image

With this, performance is maintained after changes in the PR:

Test case Total time
develop branch
Total time
PR branch
tinyllama_data_free 04:18 04:21
tinyllama_data_aware 04:06 04:07
tinyllama_data_aware_awq 03:33 03:39
tinyllama_data_aware_awq_stateful 03:03 03:03

nikita-savelyevv avatar Apr 04 '24 19:04 nikita-savelyevv

post_training_weight_compression test build 42 is green. Waiting for results of OV validation across different hardware.

nikita-savelyevv avatar Apr 05 '24 08:04 nikita-savelyevv