sparseml icon indicating copy to clipboard operation
sparseml copied to clipboard

Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models

Results 165 sparseml issues
Sort by recently updated
recently updated
newest added

Replace quant_min/max values with INT8/UINT8 ranges so ONNX export is supported for other bit widths (e.g., 4).

This PR adds a transformation that quantizes weights for weights-only quantization. It was tested on a Llama2 model.

While applying modifiers, we utilize the module's device to set the correct device for the additional modules/buffers like fake quantization modules. Presently, we default to cpu in case the module...

This PR adds support for per-token dynamic quantization. Quantization scales and zero points are computed "on-the-fly" for each new tensor. Each token has its own quantization scale and zero-point (one...

The file_path was being joined twice to the files leading to wrong paths (lines 663 and 683). This PR removes duplicate joining of path.

Example recipe for what this enables: ``` OutputDistillationModifier: targets: ['layer.1', 'layer.2'] transforms: [] comparison: "square_head" orig_scale: 1.0 distill_scale: 1.0 ```

Define separate AddInput classes for the different branches of the bottleneck. Allows control of quantization via module class in addition to name

This PR enables exporting encoder-decoder models such as FLAN-T5 using sparseml pathway. Tested on FLAN-T5 models.