wgx7054
wgx7054
> Please provide the OS, CUDA version, CPU, CPU RAM, GPU(s), GPU VRAM sizes, command line you started the vLLM with, model used and the full vLLM log output for...
> Does the problem happen on the first request or only after doing inference tasks for a while? > > Is this the vLLM API server or the OpenAI compatible...
> > GPU KV cache usage: 100.0%以后就卡住,GPU使用率也将为0,无法继续提供服务,请问有什么解决办法吗? > > What about using early releases of vllm? v0.1.4 or earlier? @wgx7054 It doesn't seem to work. Have you tried an older...