633WHU
633WHU
At present, we have found a workaround and set the swap space directly to 0. This way, we will not call the CPU swap space and will not report any...
> Also encountered the same problem, how to solve this problem? @viktor-ferenczi @WoosukKwon At present, we have found a workaround and set the swap space directly to 0. This way,...
> ### **Bug Description** > After running and test vllm successfully with [NousResearch/Llama-2-7b-chat-hf](https://huggingface.co/NousResearch/Llama-2-7b-chat-hf) and [TheBloke/Llama-2-7b-Chat-AWQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-AWQ), I change llm to [vilm/vinallama-2.7b-chat](https://huggingface.co/vilm/vinallama-2.7b-chat) - a llama-2 family model. This time the API server...
At present, we have found a workaround and set the swap space directly to 0. This way, we will not call the CPU swap space and will not report any...
At present, we have found a workaround and set the swap space directly to 0. This way, we will not call the CPU swap space and will not report any...
At present, we have found a workaround and set the swap space directly to 0. This way, we will not call the CPU swap space and will not report any...
At present, we have found a workaround and set the swap space directly to 0. This way, we will not call the CPU swap space and will not report any...
At present, we have found a workaround and set the swap space directly to 0. This way, we will not call the CPU swap space and will not report any...
> I am facing the same problem. In our case we do not use multiple workers ,but in K8 each pod is a worker running the server. Due to autoscaling...
set your streamable_http_path='/mcp/'