LightCompress
LightCompress copied to clipboard
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.
Based on the original quarot method, the R2 rotate can be observed by weight, no need for online rotate. https://github.com/ModelTC/llmc/blob/867fb4f536073a2898048c39aa098979521a45a6/llmc/compression/quantization/quarot.py#L139
1,现状:目前使用deepseek-r1进行awq量化,采用clip_version: v2报错 2,awq_w_only.yml参数如下: base: seed: &seed 42 model: type: DeepseekV3 path: /mnt/DeepSeek-R1 tokenizer_mode: slow torch_dtype: auto calib: name: pileval download: False path: /home/llmc/data/pileval n_samples: 128 bs: -1 seq_len: 512 preproc:...
Hello, thank you for your work. I’m very interested in the distributions of activations and weights after quantizing a pruned model. Is there a way to extract the input tensors...
Hello, can llmc support Whisper model quantization? Or what modifications need to be made to llmc to support quantization of the Whisper model?
hi, are there any plans to support the quantification of the gemma3 model?
您好,我在做Qwen-VL-7B量化的时候,使用awq_w_only.yml做4bit量化语言层的参数,导出设置了save_vllm=True来保存真实量化模型,但是为什么导出的模型要比原始模型大?(导出的模型28G,原始模型16G) 
get_float_qparams里求出来的tensor和scales形状永远一样,量化粒度:per-tensor/group/channel失效了?