vllm
vllm copied to clipboard
when --gpu-memory-utilization is set to 0.9, while actually the fraction of gpu memory utilization is more than 0.9
I don't know the vLLM internals well, but if you don't set worker-use-ray or engine-use-ray, some parts are not sent to ray.remote, and so gpu_memory_utilization is not used in these cases.
Maybe the gpu-memory-utilization could be more explicit about that but I'm not sure it's an issue btw.
+1 , same question