zengchao0424

Results 7 comments of zengchao0424

> hi! where is the pre-trained ABQ-LLM model zoo? hello, we will release the quantized models mentioned in the paper, pending internal company approval. We will share the HF link...

If you want to customize your own calibration dataset, you can add support for your dataset in the `get_loader` function inside `ABQ-LLM/algorithm/datautils.py`.For example, you can implement the `get_chat` function to...

Hello, the weights obtained through this way are the calibrated fake quantization weights. To achieve actual weight compression, a packing operation is required to store the weights. For example, using...

Thanks for your attention to our work. Matrix multiplication of int and float is not supported, but based on our experience in model optimization, the effect of int16 and float16...

Hello, Qwen2 implements attention calculation using GQA. In our implementation, we have added support for GQA, and using our LLaMA implementation, it can support GQA models like LLaMA-3. The model...

In practice, W4A4 low-bit quantization algorithms optimized under the "smooth paradigm" often encounter performance bottlenecks on complex evaluation tasks, particularly when applied to cutting-edge models such as LLaMA-3.1. These state-of-the-art...

Hi, for the W4A16 configuration, most mainstream quantization algorithms can achieve good results. If using a per-channel setting for W4A16, you can choose any quantization algorithm you prefer. However, in...