nncf icon indicating copy to clipboard operation
nncf copied to clipboard

[WeightCompression] Statistics caching

Open kshpv opened this issue 4 months ago • 7 comments

Changes

Add statistics saving and loading for the WeightCompression algorithm:

  1. Statistics are cached for all possible configurations such as: gptq = True, awq = True, scale_estimation=True with all type of sensitivities.
  2. Then the statistics are dumped in a file which can be reused for any weights_compression() configuration.

The example for tinyllama was updated with this functionality.

More changes:

  1. Make all statistics used in WeightCompression aligned with TesnorStatistics from nncf/experimental/common/tensor_statistics/statistics.py
  2. Extend StatisticsAggregator by the logic of loading and saving statistics.
  3. Dumping statistics is done using pickle and gzip. Serialization methods were added for Tensor.
  4. Introduced StatisticsSerializer to handle the statistics loading/dumping to the file.

Statistics sizes

Model subset size statistics file size statistics collection time
tinyllama 128 298 MB 61 sec

Reason for changes

Speed up compression configuration finding.

Related tickets

153129

Tests

Test coverage were extended by tests on StatisticsSerializer, StatisticsAggregator and on WeightCompression algorithm with the proposed functional.

kshpv avatar Oct 15 '24 16:10 kshpv