633WHU

Results 30 comments of 633WHU

At present, we have found a workaround and set the swap space directly to 0. This way, we will not call the CPU swap space and will not report any...

> Also encountered the same problem, how to solve this problem? @viktor-ferenczi @WoosukKwon At present, we have found a workaround and set the swap space directly to 0. This way,...

> ### **Bug Description** > After running and test vllm successfully with [NousResearch/Llama-2-7b-chat-hf](https://huggingface.co/NousResearch/Llama-2-7b-chat-hf) and [TheBloke/Llama-2-7b-Chat-AWQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-AWQ), I change llm to [vilm/vinallama-2.7b-chat](https://huggingface.co/vilm/vinallama-2.7b-chat) - a llama-2 family model. This time the API server...

At present, we have found a workaround and set the swap space directly to 0. This way, we will not call the CPU swap space and will not report any...

At present, we have found a workaround and set the swap space directly to 0. This way, we will not call the CPU swap space and will not report any...

At present, we have found a workaround and set the swap space directly to 0. This way, we will not call the CPU swap space and will not report any...

At present, we have found a workaround and set the swap space directly to 0. This way, we will not call the CPU swap space and will not report any...

At present, we have found a workaround and set the swap space directly to 0. This way, we will not call the CPU swap space and will not report any...

> I am facing the same problem. In our case we do not use multiple workers ,but in K8 each pod is a worker running the server. Due to autoscaling...

set your streamable_http_path='/mcp/'