dbogunowicz
dbogunowicz
The goal of this PR is to support the weight loading from the compressed `safetensor` representation. The compressed `safetensor` representation has been introduced by Neural Magic, and implemented by @Satrat...
Note: this branch requires this PR: https://github.com/neuralmagic/compressed-tensors/pull/46 to land in `compressed-tensors`. ## Example Use: ```python from sparseml.transformers import SparseAutoModelForCausalLM, SparseAutoTokenizer, oneshot import os import torch model_name = "Isotonic/TinyMixtral-4x248M-MoE" model =...
We need to update the transformers version to support QWEN2-MOE model, see: https://github.com/huggingface/transformers/releases/tag/v4.40.0 _(it also, fits into our goal to be constantly matching the latest release)_ ## Important changes ####...
## Feature Description Now this executes properly: ```python from sparseml.transformers import SparseAutoModelForCausalLM model = SparseAutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-128k-instruct",trust_remote_code=True) print(model.__class__.__name__) >> 'Phi3ForCausalLM' ``` The hack was to temporarily rename the class so that the...
## Feature description Adding the logic for loading quantized models with additional kv cache quantization, generated using `compressed-tensors` framework. Notable changes: - `CompressedTensorsConfig` right now expect to read an optional...