LightCompress icon indicating copy to clipboard operation
LightCompress copied to clipboard

[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.

Results 47 LightCompress issues
Sort by recently updated
recently updated
newest added

it is int_quant/ float_quant in these two methods, but int-quant/float-quant in samples and other methods, should they be aligned or in purpose?

Hi, There are some old or new datatypes like mx6/mx9/mxfp4 and nvfp4_e2m1, recently nvidia has support the quantization of DeepSeek-R1 model: https://huggingface.co/nvidia/DeepSeek-R1-FP4 can llmc support such new datatypes in the...

I'm trying to test this with RTN. But while named eval for transformed I'm getting following datatype error. This is my config file ```base: seed: &seed 42 model: type: Llama...

Hello, LLMC team! I was glad to discover that you added support to `HumanEval` benchmark, However, when trying to produce sanity results for unquantized models, they were significantly lower than...

使用llmc的awq_fp8.yml获取了一个fp8的llama3,如何将其作为Dense模型传入,继续进行int8量化?目前尝试直接修改model path,但ppl测试结果不合理。 ![Image](https://github.com/user-attachments/assets/05b6889a-5719-4618-ab70-48b549809e8d)

Dear LLMC team, Thank you for your continuous effort to maintain this useful repo. I've been using it for a long period, and would like to draw your attention to...

在replace-model阶段会OOM可能是什么问题呢 模型是QWen32B,用的是7张A6000 nvidia-smi显示启动了7张卡,但是模型好像只放在gpu0 ![Image](https://github.com/user-attachments/assets/b59fad31-8890-43ea-869f-f809f7985c66) 配置文件: ``` base: seed: &seed 42 model: type: Qwen2 path: ./DeepSeek-R1-Distill-Qwen-32B tokenizer_mode: slow torch_dtype: auto # calib: # name: pileval # download: True # path: ./LLMCompress/data...

Hi, Thank you for your great work on this repository! I am currently exploring the use of AWQ for model quantization and was wondering if the repository supports evaluating models...

**When I perform the multi-GPU evaluation for DeepSeekv3, the following error message is displayed:** 2025-02-21 16:29:40.579 | INFO | llmc.eval.eval_base:__init__:21 - eval_cfg : {'eval_pos': ['pretrain', 'transformed', 'fake_quant'], 'name': 'wikitext2', 'download':...

have you evaluated this algorithm and any plan to support it?