vllm Support BLOOM

BLOOM is an open-source LLM developed by BigScience. The BLOOM models have achieved high rankings in HuggingFace downloads. It'd be great to have these models in our catalog.

May 03 '23 20:05 WoosukKwon

+1 - looking forward to Bloom in vLLM

Jun 21 '23 03:06 wangkuiyi

+1

Jun 21 '23 05:06 ruidongtd

+1

Jun 21 '23 18:06 sharlec

+1

Jun 22 '23 16:06 createmomo

+1

Jun 26 '23 03:06 nuass

+1

Jun 26 '23 07:06 wengrx

+1

Jun 29 '23 22:06 gyin94

+1

Jun 30 '23 18:06 bsabri

@wangkuiyi @ruidongtd @createmomo @nuass @wengrx @rossbucky @bsabri We've just added BLOOM. You can immediately use it by installing vLLM from source.

Jul 03 '23 20:07 WoosukKwon

Super nice~

Woosuk Kwon @.***>于2023年7月4日周二04:15写道：

@wangkuiyi https://github.com/wangkuiyi @ruidongtd https://github.com/ruidongtd @createmomo https://github.com/createmomo @nuass https://github.com/nuass @wengrx https://github.com/wengrx @rossbucky https://github.com/rossbucky @bsabri https://github.com/bsabri We've just added BLOOM. You can immediately use it by installing vLLM from source https://vllm.readthedocs.io/en/latest/getting_started/installation.html#build-from-source .

— Reply to this email directly, view it on GitHub https://github.com/vllm-project/vllm/issues/61#issuecomment-1619101312, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCFXRJA63TKXFOLL4N2CQLXOMR7TANCNFSM6AAAAAAXU44HB4 . You are receiving this because you were mentioned.Message ID: @.***>

Jul 04 '23 14:07 createmomo

I used vLLM try to speed up my BLOOM model, but found that the speed did not improve. Moreover, the memory usage of vLLM is higher, what may be the reason?
vLLM:
ad66d7b4-8f0f-46d7-85f0-60d6f1dcce86 HF:
778571a7-db1e-44ba-9dce-2864c7b598e9

Jul 13 '23 04:07 Hukongtao

Hi @Hukongtao, thanks for trying out vLLM! The memory usage is high because vLLM pre-allocates the space to store KV cache. You can configure the memory usage by tuning the gpu_memory_utilization parameter, which is 0.9 (i.e., 90% of your GPU memory capacity) by default.

Speaking of the speed, could you share the model (size) you are using and also your benchmark results?

Jul 14 '23 16:07 WoosukKwon

@WoosukKwon Thanks for your replying. Great jobs!

Jul 16 '23 04:07 Hukongtao

vllm vllm copied to clipboard

Support BLOOM

vllm
vllm copied to clipboard