nncf icon indicating copy to clipboard operation
nncf copied to clipboard

[NNCF]: Optimized memory footprint by removing redundant collected statistics

Open AdiKsOnDev opened this issue 3 months ago • 29 comments

Changes

Made changes in:

  • nncf/common/tensor_statistics/statistic_point.py
  • nncf/quantization/algorithms/pipeline.py

Focus on the way I used newly created remove_statistic_point() inside pipeline.py, to see if it's up to the expectations

Reason for changes

Save space by removing "unused" statistic points associated with an algorithm

Related tickets

120377

Tests

tests/common/test_statistic_points.py was added:

  • [x] Test removing associated statistical points
  • [x] Test removing from an empty container

Closes #2557

AdiKsOnDev avatar Mar 08 '24 12:03 AdiKsOnDev

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 52.32%. Comparing base (8ad77dc) to head (aefd170). Report is 3 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##           develop    #2563       +/-   ##
============================================
- Coverage    77.91%   52.32%   -25.59%     
============================================
  Files          494      494               
  Lines        45387    45387               
============================================
- Hits         35363    23749    -11614     
- Misses       10024    21638    +11614     
Files Coverage Δ
nncf/common/tensor_statistics/statistic_point.py 93.10% <100.00%> (+1.61%) :arrow_up:
nncf/quantization/algorithms/algorithm.py 100.00% <100.00%> (ø)
nncf/quantization/algorithms/pipeline.py 93.50% <100.00%> (-1.17%) :arrow_down:

... and 276 files with indirect coverage changes

Flag Coverage Δ
COMMON 44.22% <77.77%> (?)
ONNX 34.65% <100.00%> (?)
TENSORFLOW 30.11% <22.22%> (+<0.01%) :arrow_up:
TORCH ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
common 87.60% <100.00%> (-0.72%) :arrow_down:
torch 32.25% <ø> (-61.24%) :arrow_down:
tensorflow 93.74% <ø> (ø)
onnx 93.07% <ø> (+93.07%) :arrow_up:
openvino 0.00% <ø> (-25.77%) :arrow_down:
ptq 48.47% <100.00%> (-4.58%) :arrow_down:

codecov[bot] avatar Mar 08 '24 13:03 codecov[bot]

@kshpv Are any actions expected from me any further?

AdiKsOnDev avatar Mar 08 '24 18:03 AdiKsOnDev

@kshpv Is any action expected from me any further?

Could you add a test on this functionality? You can create a file test_statistic_points.py with a test and put it in tests/common/

Also, it would be beneficial if you provide the memory usage before and after your changes for some of NNCF examples

Thank you!

kshpv avatar Mar 08 '24 18:03 kshpv

Yes, absolutely! I shall request a re-review for another PR tomorrow (hopefully) with a test for this and I'll give the results of a benchmark before/after the change

Thank you for responding so quickly!

AdiKsOnDev avatar Mar 08 '24 18:03 AdiKsOnDev

Benchmark Examples

@kshpv

Before

before

After

Screenshot_20240309_161303

AdiKsOnDev avatar Mar 09 '24 12:03 AdiKsOnDev

@KodiaqQ Are any actions expected from me any further?

AdiKsOnDev avatar Mar 11 '24 13:03 AdiKsOnDev

Benchmark Examples

@kshpv

Before

...

After

...

Hi, @AdiKsOnDev. Thank you for your contribution. I wonder to understand, why the performance of the model was decreased after the changes?

KodiaqQ avatar Mar 11 '24 13:03 KodiaqQ

Benchmark Examples

@kshpv

Before

...

After

...

Hi, @AdiKsOnDev. Thank you for your contribution. I wonder to understand, why the performance of the model was decreased after the changes?

Hi, can it be that in the remove_statistical_points() the loop that's doing all the work is taking too long?

AdiKsOnDev avatar Mar 11 '24 13:03 AdiKsOnDev

@KodiaqQ I returned the deleted line. Will update you with the benchmark in a bit.

AdiKsOnDev avatar Mar 11 '24 16:03 AdiKsOnDev

Benchmarks

@KodiaqQ EDIT: I ran the benchmark again without the remove_statistical_points() and got the performance speed of 1.952 Everything else (Besides the latency and throughput) is the same as in screenshots below

Before (Taken from previous comment)

before

After

image

AdiKsOnDev avatar Mar 11 '24 16:03 AdiKsOnDev

@AdiKsOnDev, could you please provide memory usage measurements before & after changes?

KodiaqQ avatar Mar 12 '24 07:03 KodiaqQ

@AdiKsOnDev, could you please provide memory usage measurements before & after changes?

Sorry if it's a stupid question, but where can I check that?

AdiKsOnDev avatar Mar 12 '24 07:03 AdiKsOnDev

@AdiKsOnDev, could you please provide memory usage measurements before & after changes?

Sorry if it's a stupid question, but where can I check that?

You can use a memory-profiler package, for example.

KodiaqQ avatar Mar 12 '24 07:03 KodiaqQ

@AdiKsOnDev, could you please provide memory usage measurements before & after changes?

Sorry if it's a stupid question, but where can I check that?

You can use a memory-profiler package, for example.

Okay, thank you

AdiKsOnDev avatar Mar 12 '24 07:03 AdiKsOnDev

@KodiaqQ

Before

image

After

image

AdiKsOnDev avatar Mar 12 '24 07:03 AdiKsOnDev

Pardon, messed up the screenshots, will update in a bit

AdiKsOnDev avatar Mar 12 '24 07:03 AdiKsOnDev

@KodiaqQ

Before

image image

After

image image

AdiKsOnDev avatar Mar 12 '24 08:03 AdiKsOnDev

@KodiaqQ

Before

..

After

..

I think that change from ~765Mb -> ~750Mb while quantize is what we want.

KodiaqQ avatar Mar 12 '24 08:03 KodiaqQ

@KodiaqQ

Before

..

After

..

I think that change from ~765Mb -> ~750Mb while quantize is what we want.

Perfect! I hope we're good to go :)

AdiKsOnDev avatar Mar 12 '24 08:03 AdiKsOnDev

Internal run test_examples/295/ passed.

KodiaqQ avatar Mar 12 '24 11:03 KodiaqQ

@AdiKsOnDev, could you run another experiment, please? For the memory measurements in different cases. It may be done with the mobilenetv2 example, that you've used before. Here are the changes that should be enough to activate the extended scope of the algorithms:

image changes.patch

With these changes, it needs to measure only memory consumption (may be done via memory-profiler as well). It would be great if you could provide only plots before and after. An example of the plot generation with memory-profiler can be found in the documentation (mprof plot command, as I can see).

KodiaqQ avatar Mar 12 '24 15:03 KodiaqQ

Sure, I'll try once I'm home Will update u tomorrow

AdiKsOnDev avatar Mar 12 '24 16:03 AdiKsOnDev

@KodiaqQ Pardon me for not showing the plots for the previous test, I did see them in the PyPi documentation but couldn't use them bcs of PyQt5... Anyways, here you go:

Before

image

After

image

AdiKsOnDev avatar Mar 12 '24 18:03 AdiKsOnDev

@KodiaqQ following up on the above

AdiKsOnDev avatar Mar 13 '24 09:03 AdiKsOnDev

@AdiKsOnDev, currently the latest changes break the tests. Could you fix the issues, please?

KodiaqQ avatar Mar 13 '24 09:03 KodiaqQ

@AdiKsOnDev, currently the latest changes break the tests. Could you fix the issues, please?

Yeah, I am already doing that

AdiKsOnDev avatar Mar 13 '24 10:03 AdiKsOnDev

@KodiaqQ Plots after the (possible) fix:

Before remove_statistical_points()

image

After remove_statistical_points()

image

AdiKsOnDev avatar Mar 13 '24 10:03 AdiKsOnDev

@KodiaqQ I read the review, will implement the changes as soon as I get the chance Thank you, have a good evening!

AdiKsOnDev avatar Mar 13 '24 14:03 AdiKsOnDev

@KodiaqQ could you check _algorithm_key property I added in the recent commit, please? I am not 100% sure if this is what you needed

AdiKsOnDev avatar Mar 15 '24 07:03 AdiKsOnDev

@KodiaqQ following up

AdiKsOnDev avatar Mar 19 '24 03:03 AdiKsOnDev