RobertLiu0905
RobertLiu0905
I grabbed the flame chart, and the problem was gptq.py#apply_weights#ops.gptq_gemm。May be beyond the capacity of cuda computing。 data:image/s3,"s3://crabby-images/2a01f/2a01f12e4bf38d8f004cc16014c7d7051b98019a" alt="image" data:image/s3,"s3://crabby-images/8ddd4/8ddd4520c9e393da9ade629b2f477898687365f1" alt="image"
I also encountered this problem. I solved it by compiling the [nccl source code](https://github.com/NVIDIA/nccl) and then modifying the path of libnccl.so.2 in the vllm source code.
Expect the same