wgx7054 comments

Repositories
Issues
Comments

Results 3 comments of


                                            wgx7054

GPU KV cache usage: 100.0%以后就卡住

> Please provide the OS, CUDA version, CPU, CPU RAM, GPU(s), GPU VRAM sizes, command line you started the vLLM with, model used and the full vLLM log output for...

GPU KV cache usage: 100.0%以后就卡住

> Does the problem happen on the first request or only after doing inference tasks for a while? > > Is this the vLLM API server or the OpenAI compatible...

GPU KV cache usage: 100.0%以后就卡住

> > GPU KV cache usage: 100.0%以后就卡住，GPU使用率也将为0，无法继续提供服务，请问有什么解决办法吗？ > > What about using early releases of vllm? v0.1.4 or earlier? @wgx7054 It doesn't seem to work. Have you tried an older...