ipex-llm
ipex-llm copied to clipboard
support inference AWQ INT4 model of Yi-34B from QLoRA
I will provide AWQ model from customer, and customer will evaluate FP8 and Int4 performance.
Hi, I have verified AWQ models can be supported (loaded in vLLM and converted to LowBitLinear in ipex-llm), but only asym_int4 quantization format is supported.
This feature will need some adaption from both vLLM side and ipex-llm side. I will update to this thread once those supported prs are merged.
fixed