dbogunowicz issues

Results 28 issues of


                                            dbogunowicz

Support for `compressed-tensors`

The goal of this PR is to support the weight loading from the compressed `safetensor` representation. The compressed `safetensor` representation has been introduced by Neural Magic, and implemented by @Satrat...

[WiP] Whisper Implementation

[MOE Quantization] Warn against "undercalibrated" modules

Note: this branch requires this PR: https://github.com/neuralmagic/compressed-tensors/pull/46 to land in `compressed-tensors`. ## Example Use: ```python from sparseml.transformers import SparseAutoModelForCausalLM, SparseAutoTokenizer, oneshot import os import torch model_name = "Isotonic/TinyMixtral-4x248M-MoE" model =...

[MOE Quantization] Update transformers version to 4.40.0

We need to update the transformers version to support QWEN2-MOE model, see: https://github.com/huggingface/transformers/releases/tag/v4.40.0 _(it also, fits into our goal to be constantly matching the latest release)_ ## Important changes ####...

[Fix] Allow to create `SparseAutoModelForCausalLM` with `trust_remote_code=True`

## Feature Description Now this executes properly: ```python from sparseml.transformers import SparseAutoModelForCausalLM model = SparseAutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-128k-instruct",trust_remote_code=True) print(model.__class__.__name__) >> 'Phi3ForCausalLM' ``` The hack was to temporarily rename the class so that the...

dbogunowicz

Support for `compressed-tensors`

[WiP] Whisper Implementation

[MOE Quantization] Warn against "undercalibrated" modules

[MOE Quantization] Update transformers version to 4.40.0

[Fix] Allow to create `SparseAutoModelForCausalLM` with `trust_remote_code=True`

Change the test llama model to 'HuggingFaceM4/tiny-random-LlamaForCausalLM'

[DO NOT MERGE] 1.8 end2end testing

Support for quantized kv cache (compressed-tensors)