llm-inference Support Quantized Model

Support Quantized Model

Open SeanHH86 opened this issue 11 months ago • 1 comments

Support Quantized Model.
For example: https://huggingface.co/THUDM/chatglm2-6b-int4 https://huggingface.co/Qwen/Qwen1.5-72B-Chat-GPTQ-Int4

Mar 13 '24 06:03 SeanHH86

Inference's speed is slow

Apr 16 '24 01:04 SeanHH86