sparseml issues

INT4 ONNX export

Replace quant_min/max values with INT8/UINT8 ranges so ONNX export is supported for other bit widths (e.g., 4).

ONNX export for weights-only quantization

This PR adds a transformation that quantizes weights for weights-only quantization. It was tested on a Llama2 model.

Allow buffers to set modifier's device

While applying modifiers, we utilize the module's device to set the correct device for the additional modules/buffers like fake quantization modules. Presently, we default to cpu in case the module...

abhinavnmagic

Per-token dynamic quantization

This PR adds support for per-token dynamic quantization. Quantization scales and zero points are computed "on-the-fly" for each new tensor. Each token has its own quantization scale and zero-point (one...

anmarques

Fix loading of state_dict for for quantized transformers

The file_path was being joined twice to the files leading to wrong paths (lines 663 and 683). This PR removes duplicate joining of path.

anmarques

Square head distillation implementation for new SparseML framework

Example recipe for what this enables: ``` OutputDistillationModifier: targets: ['layer.1', 'layer.2'] transforms: [] comparison: "square_head" orig_scale: 1.0 distill_scale: 1.0 ```

markurtz

[LLAMA] Export Pathway Script

1

dsikka

Improve control of RN50 quantization

Define separate AddInput classes for the different branches of the bottleneck. Allows control of quantization via module class in addition to name

anmarques

Onnx torch matcher

horheynm

Enabling ONNX export for HF Seq2SeqLM models

This PR enables exporting encoder-decoder models such as FLAN-T5 using sparseml pathway. Tested on FLAN-T5 models.

abhinavnmagic

sparseml
sparseml copied to clipboard

Metadata

INT4 ONNX export

ONNX export for weights-only quantization

Allow buffers to set modifier's device

Per-token dynamic quantization

Fix loading of state_dict for for quantized transformers

Square head distillation implementation for new SparseML framework

[LLAMA] Export Pathway Script

Improve control of RN50 quantization

Onnx torch matcher

Enabling ONNX export for HF Seq2SeqLM models

← Metadata

Owner

Metadata

sparseml sparseml copied to clipboard

Metadata

← Metadata

Owner

Metadata

sparseml
sparseml copied to clipboard