vllm icon indicating copy to clipboard operation
vllm copied to clipboard

Support BLOOM

Open WoosukKwon opened this issue 1 year ago • 8 comments

BLOOM is an open-source LLM developed by BigScience. The BLOOM models have achieved high rankings in HuggingFace downloads. It'd be great to have these models in our catalog.

WoosukKwon avatar May 03 '23 20:05 WoosukKwon

+1 - looking forward to Bloom in vLLM

wangkuiyi avatar Jun 21 '23 03:06 wangkuiyi

+1

ruidongtd avatar Jun 21 '23 05:06 ruidongtd

+1

sharlec avatar Jun 21 '23 18:06 sharlec

+1

createmomo avatar Jun 22 '23 16:06 createmomo

+1

nuass avatar Jun 26 '23 03:06 nuass

+1

wengrx avatar Jun 26 '23 07:06 wengrx

+1

gyin94 avatar Jun 29 '23 22:06 gyin94

+1

bsabri avatar Jun 30 '23 18:06 bsabri

@wangkuiyi @ruidongtd @createmomo @nuass @wengrx @rossbucky @bsabri We've just added BLOOM. You can immediately use it by installing vLLM from source.

WoosukKwon avatar Jul 03 '23 20:07 WoosukKwon

Super nice~

Woosuk Kwon @.***>于2023年7月4日 周二04:15写道:

@wangkuiyi https://github.com/wangkuiyi @ruidongtd https://github.com/ruidongtd @createmomo https://github.com/createmomo @nuass https://github.com/nuass @wengrx https://github.com/wengrx @rossbucky https://github.com/rossbucky @bsabri https://github.com/bsabri We've just added BLOOM. You can immediately use it by installing vLLM from source https://vllm.readthedocs.io/en/latest/getting_started/installation.html#build-from-source .

— Reply to this email directly, view it on GitHub https://github.com/vllm-project/vllm/issues/61#issuecomment-1619101312, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCFXRJA63TKXFOLL4N2CQLXOMR7TANCNFSM6AAAAAAXU44HB4 . You are receiving this because you were mentioned.Message ID: @.***>

createmomo avatar Jul 04 '23 14:07 createmomo

I used vLLM try to speed up my BLOOM model, but found that the speed did not improve. Moreover, the memory usage of vLLM is higher, what may be the reason?
vLLM:
ad66d7b4-8f0f-46d7-85f0-60d6f1dcce86 HF:
778571a7-db1e-44ba-9dce-2864c7b598e9

Hukongtao avatar Jul 13 '23 04:07 Hukongtao

Hi @Hukongtao, thanks for trying out vLLM! The memory usage is high because vLLM pre-allocates the space to store KV cache. You can configure the memory usage by tuning the gpu_memory_utilization parameter, which is 0.9 (i.e., 90% of your GPU memory capacity) by default.

Speaking of the speed, could you share the model (size) you are using and also your benchmark results?

WoosukKwon avatar Jul 14 '23 16:07 WoosukKwon

@WoosukKwon Thanks for your replying. Great jobs!

Hukongtao avatar Jul 16 '23 04:07 Hukongtao