arkohut comments

Results 69 comments of


                                            arkohut

Local caching / Custom Registry

Great question. I have similar issue.

OutOfMemoryError

https://github.com/vllm-project/vllm/issues/2413 This may be helpful.

Add gradio chatbot for openai webserver

> The more code you have in the repo that isn't the main purpose of the project, the harder it is to maintain high quality code and quickly deliver features....

Add gradio chatbot for openai webserver

> Can you move it to `examples/gradio_openai_chatbot_webserver.py`? It's done.

Add gradio chatbot for openai webserver

Do I need to do any other update? @zhuohan123

Speed between gptq w4a16 and awq w4a16?

Try gptq and awq quantization of Mixtral-8x7B-Instruct-v0.1 got quite different performance. ## GPU A40 48G VRAM ## vLLM version `0.2.6` The latest version `0.2.7` will run out of memory for...

Speed between gptq w4a16 and awq w4a16?

Sorry for the wrong info, during my test awq is much faster than gptq. I already updated the message.

Speed between gptq w4a16 and awq w4a16?

So maybe the MoE model is quite different?

Add gradio tunneling

1. The particular feature is that it includes a free frp server from gradio. It is maintained by huggingface. So it looks quite stable, more info can be found in...

Add gradio tunneling

@anderspitman