RobertLiu0905 comments

Repositories
Issues
Comments

Results 3 comments of


                                            RobertLiu0905

GPU KV cache usage: 100.0%以后就卡住

I grabbed the flame chart, and the problem was gptq.py#apply_weights#ops.gptq_gemm。May be beyond the capacity of cuda computing。 ![image](https://github.com/vllm-project/vllm/assets/10494702/7316fd86-7c0b-4b20-bb87-36bc13875783) ![image](https://github.com/vllm-project/vllm/assets/10494702/fbb7ced7-f930-4683-9787-0fd505413384)

[Bug]: with `worker_use_ray = true`, and tensor_parallel_size > 1, the process is pending forever

I also encountered this problem. I solved it by compiling the [nccl source code](https://github.com/NVIDIA/nccl) and then modifying the path of libnccl.so.2 in the vllm source code.

Support for deepseek v2

Expect the same