Qwen2.5-Omni-7B with Auto-Round
Is there any plan to release the Qwen2.5-Omni-7B quantized use-able with auto-round ? As it demands hell of memory and with auto-round it could be easily use-able I guess. Or any idea how to handle that model with auto-round?
Is there any issue with the current code? If the code has problems, or if you don’t have sufficient hardware resources, you could try RTN mode (iters==0) first, it is very fast and requires fewer resources. If you encounter any issues, please report them to us and we will do our best to fix them
We will also give it a try when we have time. Since we are a very small team not only exploring algorithms but also maintaining this repository and generating quantized models, we are not committed to quantizing every popular model.
@wenhuach21 The current code and model needs 360 GB vRAM which is much higher than normal resources demand.
Could you check the following document to try reducing GPU memory usage, for example by enabling "low_gpu_mem_usage":
https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#adjust-hyperparameters
If none of the above works, you can use RTN mode --iters 0, which typically requires less than 5GB of VRAM.