auto-round icon indicating copy to clipboard operation
auto-round copied to clipboard

Qwen2.5-Omni-7B with Auto-Round

Open Tortoise17 opened this issue 4 months ago • 3 comments

Is there any plan to release the Qwen2.5-Omni-7B quantized use-able with auto-round ? As it demands hell of memory and with auto-round it could be easily use-able I guess. Or any idea how to handle that model with auto-round?

Tortoise17 avatar Aug 29 '25 08:08 Tortoise17

Is there any issue with the current code? If the code has problems, or if you don’t have sufficient hardware resources, you could try RTN mode (iters==0) first, it is very fast and requires fewer resources. If you encounter any issues, please report them to us and we will do our best to fix them

We will also give it a try when we have time. Since we are a very small team not only exploring algorithms but also maintaining this repository and generating quantized models, we are not committed to quantizing every popular model.

wenhuach21 avatar Aug 29 '25 08:08 wenhuach21

@wenhuach21 The current code and model needs 360 GB vRAM which is much higher than normal resources demand.

Tortoise17 avatar Aug 29 '25 08:08 Tortoise17

Could you check the following document to try reducing GPU memory usage, for example by enabling "low_gpu_mem_usage":

https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#adjust-hyperparameters

If none of the above works, you can use RTN mode --iters 0, which typically requires less than 5GB of VRAM.

wenhuach21 avatar Aug 29 '25 08:08 wenhuach21