vllm
vllm copied to clipboard
Disable cuda version check in vllm-openai image
Fix #4521
Currently we no need to check cuda version when using fp8 kv cache. As of now, vLLM's binaries are compiled with CUDA 12.1 and public PyTorch release versions by default. The vllm-openai image has also CUDA 12.1.
sorry i just merged the other PR, can you resolve the conflict?
🤦♂️ sorry another conflict
@simon-mo The conflict is solved. Please take a review.