Rahmat711 comments

Repositories
Issues
Comments

Results 4 comments of


                                            Rahmat711

I want to close kv cache. if i set gpu_memory_utilization is 0. Does it means that i close the kv cache?

@amulil KV caching is used for different purpose in Vllm compared with huggingface caching.

OutOfMemoryError

increase gpu_memory_utilization to 0.95 or 1

KV cache is low, memory profiling does not see the remaining VRAM

@WoosukKwon the vllm model run into infinity it keeps running that 2 request for ever when tried with mistral 7b instruct and not responding to new request. Is this a...

KV cache is low, memory profiling does not see the remaining VRAM

@viktor-ferenczi i am using version 0.2.6