LLMZoo running model with 8-bit, too slow!

running model with 8-bit, too slow!

Open ananwjq opened this issue 1 year ago • 2 comments

Hi, I am running model:chimera-inst-chat-13b in 8-bit on a A100, and it cost almost double time than FP16 version, is it normal? 877465245a89e83d2aab0409f45df1ec And I found that 8-bit LLM is slower than FP16 in HuggingFace's blog: https://huggingface.co/blog/hf-bitsandbytes-integration

Apr 24 '23 11:04 ananwjq

Hi @ananwjq,

Thanks for your feedback!

We are working on efficient inference and will give a detailed comparison.

Best, Zhihong

Apr 25 '23 15:04 zhjohnchan

waiting for the result!

Apr 26 '23 09:04 chuckhope

LLMZoo LLMZoo copied to clipboard

running model with 8-bit, too slow!

LLMZoo
LLMZoo copied to clipboard