LLMZoo
LLMZoo copied to clipboard
running model with 8-bit, too slow!
Hi, I am running model:chimera-inst-chat-13b in 8-bit on a A100, and it cost almost double time than FP16 version, is it normal?
And I found that 8-bit LLM is slower than FP16 in HuggingFace's blog: https://huggingface.co/blog/hf-bitsandbytes-integration
Hi @ananwjq,
Thanks for your feedback!
We are working on efficient inference and will give a detailed comparison.
Best, Zhihong
waiting for the result!