nncf [WeightCompression] Statistics caching

[WeightCompression] Statistics caching

Open kshpv opened this issue 4 months ago • 7 comments

Changes

Add statistics saving and loading for the WeightCompression algorithm:

Statistics are cached for all possible configurations such as: gptq = True, awq = True, scale_estimation=True with all type of sensitivities.
Then the statistics are dumped in a file which can be reused for any weights_compression() configuration.

The example for tinyllama was updated with this functionality.

More changes:

Make all statistics used in WeightCompression aligned with TesnorStatistics from nncf/experimental/common/tensor_statistics/statistics.py
Extend StatisticsAggregator by the logic of loading and saving statistics.
Dumping statistics is done using pickle and gzip. Serialization methods were added for Tensor.
Introduced StatisticsSerializer to handle the statistics loading/dumping to the file.

Statistics sizes

Model	subset size	statistics file size	statistics collection time
tinyllama	128	298 MB	61 sec

Reason for changes

Speed up compression configuration finding.

Related tickets

153129

Tests

Test coverage were extended by tests on StatisticsSerializer, StatisticsAggregator and on WeightCompression algorithm with the proposed functional.

Oct 15 '24 16:10 kshpv

nncf nncf copied to clipboard

[WeightCompression] Statistics caching

Changes

Reason for changes

Related tickets

Tests

nncf
nncf copied to clipboard