vllm
vllm copied to clipboard
Support BLOOM
BLOOM is an open-source LLM developed by BigScience. The BLOOM models have achieved high rankings in HuggingFace downloads. It'd be great to have these models in our catalog.
+1 - looking forward to Bloom in vLLM
+1
+1
+1
+1
+1
+1
+1
@wangkuiyi @ruidongtd @createmomo @nuass @wengrx @rossbucky @bsabri We've just added BLOOM. You can immediately use it by installing vLLM from source.
Super nice~
Woosuk Kwon @.***>于2023年7月4日 周二04:15写道:
@wangkuiyi https://github.com/wangkuiyi @ruidongtd https://github.com/ruidongtd @createmomo https://github.com/createmomo @nuass https://github.com/nuass @wengrx https://github.com/wengrx @rossbucky https://github.com/rossbucky @bsabri https://github.com/bsabri We've just added BLOOM. You can immediately use it by installing vLLM from source https://vllm.readthedocs.io/en/latest/getting_started/installation.html#build-from-source .
— Reply to this email directly, view it on GitHub https://github.com/vllm-project/vllm/issues/61#issuecomment-1619101312, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCFXRJA63TKXFOLL4N2CQLXOMR7TANCNFSM6AAAAAAXU44HB4 . You are receiving this because you were mentioned.Message ID: @.***>
I used vLLM try to speed up my BLOOM model, but found that the speed did not improve. Moreover, the memory usage of vLLM is higher, what may be the reason?
vLLM:
HF:
Hi @Hukongtao, thanks for trying out vLLM! The memory usage is high because vLLM pre-allocates the space to store KV cache. You can configure the memory usage by tuning the gpu_memory_utilization
parameter, which is 0.9 (i.e., 90% of your GPU memory capacity) by default.
Speaking of the speed, could you share the model (size) you are using and also your benchmark results?
@WoosukKwon Thanks for your replying. Great jobs!