vllm Disable cuda version check in vllm-openai image

Disable cuda version check in vllm-openai image

Open zhaoyang-star opened this issue 9 months ago • 2 comments

Fix #4521

Currently we no need to check cuda version when using fp8 kv cache. As of now, vLLM's binaries are compiled with CUDA 12.1 and public PyTorch release versions by default. The vllm-openai image has also CUDA 12.1.

May 01 '24 15:05 zhaoyang-star

sorry i just merged the other PR, can you resolve the conflict?

May 01 '24 16:05 simon-mo

🤦‍♂️ sorry another conflict

May 02 '24 18:05 simon-mo

@simon-mo The conflict is solved. Please take a review.

May 05 '24 08:05 zhaoyang-star

vllm vllm copied to clipboard

Disable cuda version check in vllm-openai image

vllm
vllm copied to clipboard