llm-inference
llm-inference copied to clipboard
Support Quantized Model
Support Quantized Model.
For example:
https://huggingface.co/THUDM/chatglm2-6b-int4
https://huggingface.co/Qwen/Qwen1.5-72B-Chat-GPTQ-Int4
Inference's speed is slow