sparseml
sparseml copied to clipboard
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
Bumps [tqdm](https://github.com/tqdm/tqdm) from 4.61.1 to 4.66.3. Release notes Sourced from tqdm's releases. tqdm v4.66.3 stable cli: eval safety (fixes CVE-2024-34062, GHSA-g7vv-2v7x-gj9p) tqdm v4.66.2 stable pandas: add DataFrame.progress_map (#1549) notebook: fix...
This PR incorporates changes from @abhinavnmagic's PR https://github.com/neuralmagic/sparseml/pull/2222 into new modifier UX We introduce a new argument `preserve_sparsity_mask` in `SparseGPTModifier` which can be used to extend or ignore the base...
Recently a bug was revealed, where if GPTQ modifier was applied consecutively after SparseGPT, the weight sparsity mask was not being respected, this PR fixes that by preserving the mask,...
This pull request introduces an integration check to ensure the preservation of mask structure across consecutive runs. The process includes: - **Initial pruning of the model** using a mask structure:...
* Update e2e regression tests for channelwise scale and zero-point, added channelwise recipe * Refactored 1.1b test to run on a nightly cadence, 15M will run on commit
## Feature Description Now this executes properly: ```python from sparseml.transformers import SparseAutoModelForCausalLM model = SparseAutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-128k-instruct",trust_remote_code=True) print(model.__class__.__name__) >> 'Phi3ForCausalLM' ``` The hack was to temporarily rename the class so that the...
Activation Ordering implementation. Checked lm_eval value with `actorder=True` ``` "metrics":[{"name":"word_perplexity,none","value":10.17568878732032}, ```
Hi, while evaluating the performance of the quantized v8 models I realized that the current exporting pipeline does something slightly different from how the models were actually exported for the...
Hi, I have a model that cannot be traced back to any of the default supported architectures (yolo, llm, trasformers...). I would like to see the benefits of sparsification on...
Hi, I trained YOLOv8 model and exported the model to ONNX format by the quantization_recipe below, I set weight_bits=8 and activation_bits=8 to ensure the full-flow inference of quantized model is...