sparseml
sparseml copied to clipboard
Adding HistogramObserver
The PR adds support for utilizing HistogramObserver from PyTorch which computes the min/max values for quantization by minimizing quantization error. The implementation has been tested on CodeLlama and Llama-2 models.