LightCompress issues

自定义校准数据集文档与代码对不上

3

doc中要求自定义数据集为txt格式，每行为一个文本 ``` calib: name: custom download: False load_from_txt: True path: # Custom dataset, ending with txt as suffix n_samples: 128 bs: -1 seq_len: 512 preproc: random_truncate_txt seed: *seed ``` 运行报错：...

xinyubai1209

only support hopper GPU?

3

Very good work, but I have some questions to consult. When I tried to run the code, I encountered the following error. [rank0]: Traceback (most recent call last): [rank0]: File...

QingshuiL

Two Issues / Bugs with NaiveQuantKVCache Implementation

1

Dear LLMC team, Thank you for introducing support to KV cache quantization, I found it really useful! I'd like to share two observations that affect the correctness and efficiency of...

sasha-hailo

BUG: Mixed-precision configuration not working with STATIC quantization

17

Dear LLMC team, I've been trying to run mixed-precision PTQ quantization using RTN. I suspect there's a bug, as the **non-default settings in `mix_bits` are ignored**. My understanding of the...

sasha-hailo

bug

[W416 15:12:21.131066630 socket.cpp:933] [c10d] The server socket on [::ffff:36.xxx.xxx.13]:38405 has timed out, will retry.

### 报错日志： [W416 15:37:25.363094293 socket.cpp:933] [c10d] The server socket on [::ffff:36.xxx.xxx.13]:40613 has timed out, will retry. [W416 15:39:40.531073896 socket.cpp:933] [c10d] The server socket on [::ffff:36.xxx.xxx.13]:40613 has timed out, will retry....

dzy1128

Maybe a bug in "allow_padding"

https://github.com/ModelTC/llmc/blob/main/llmc/compression/quantization/quant.py Line 629: deficiency = self.group_size - tensor.shape[1] % self.group_size tensor.shape[-1] is ok?

pengyao96

How to get the calibration dataset for Multimodal-LLMs

As far as I am concerned, since there are only four calibration dataset ('pileval', 'c4', 'wikitext2', and 'ptb'), which only support text-modal LLMs, I would like to ask how to...

shixu312349410

Saving/Loading Fake Quant weights

1

I'm encounting some issues on saving/loading fake quant. I'm trying to save with an `save_fake` option and load it again to check. How can I load fake quant model in...

allzero-kwon

Does SmoothQuant support W8A8 (activation per tensor static quant) for DeepSeek-R1?

2

When I use the following configuration file: ```yaml base: seed: &seed 42 model: type: DeepseekV3 path: xxx tokenizer_mode: fast torch_dtype: torch.float8_e4m3fn calib: name: pileval download: False path: xxx n_samples: 128...

taishan1994

OOM Error when Running AWQ + OmniQuant Combination (Step 2: OmniQuant) Despite Using Multiple GPUs

I'm encountering an Out-of-Memory (OOM) error when trying to run the second step (OmniQuant) of the AWQ + OmniQuant combination quantization method. This happens despite having allocated 2 A40 GPUs...

Barryshen1

LightCompress
LightCompress copied to clipboard

Metadata

自定义校准数据集文档与代码对不上

only support hopper GPU?

Two Issues / Bugs with NaiveQuantKVCache Implementation

BUG: Mixed-precision configuration not working with STATIC quantization

[W416 15:12:21.131066630 socket.cpp:933] [c10d] The server socket on [::ffff:36.xxx.xxx.13]:38405 has timed out, will retry.

Maybe a bug in "allow_padding"

How to get the calibration dataset for Multimodal-LLMs

Saving/Loading Fake Quant weights

Does SmoothQuant support W8A8 (activation per tensor static quant) for DeepSeek-R1?

OOM Error when Running AWQ + OmniQuant Combination (Step 2: OmniQuant) Despite Using Multiple GPUs

← Metadata

Owner

Metadata

LightCompress LightCompress copied to clipboard

Metadata

← Metadata

Owner

Metadata

LightCompress
LightCompress copied to clipboard