auto-round Qwen2.5-Omni-7B with Auto-Round

Is there any plan to release the Qwen2.5-Omni-7B quantized use-able with auto-round ? As it demands hell of memory and with auto-round it could be easily use-able I guess. Or any idea how to handle that model with auto-round?

Aug 29 '25 08:08 Tortoise17

Is there any issue with the current code? If the code has problems, or if you don’t have sufficient hardware resources, you could try RTN mode (iters==0) first, it is very fast and requires fewer resources. If you encounter any issues, please report them to us and we will do our best to fix them

We will also give it a try when we have time. Since we are a very small team not only exploring algorithms but also maintaining this repository and generating quantized models, we are not committed to quantizing every popular model.

Aug 29 '25 08:08 wenhuach21

@wenhuach21 The current code and model needs 360 GB vRAM which is much higher than normal resources demand.

Aug 29 '25 08:08 Tortoise17

Could you check the following document to try reducing GPU memory usage, for example by enabling "low_gpu_mem_usage":

https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#adjust-hyperparameters

If none of the above works, you can use RTN mode --iters 0, which typically requires less than 5GB of VRAM.

Aug 29 '25 08:08 wenhuach21