support inference AWQ INT4 model of Yi-34B from QLoRA

Open Fred-cell opened this issue 1 year ago • 1 comments

I will provide AWQ model from customer, and customer will evaluate FP8 and Int4 performance.

Aug 28 '24 01:08 Fred-cell

Hi, I have verified AWQ models can be supported (loaded in vLLM and converted to LowBitLinear in ipex-llm), but only asym_int4 quantization format is supported.

This feature will need some adaption from both vLLM side and ipex-llm side. I will update to this thread once those supported prs are merged.

Aug 28 '24 01:08 gc-fu

fixed

Dec 11 '24 06:12 glorysdj