John Hawkins comments

Repositories
Issues
Comments

Results 2 comments of


                                            John Hawkins

[how-to-run-inference-cloud-run-gpu-vllm]:

My best bet at this time is that the problem is caused by an incompatibility in CUDA support The vLLM containers are built for CUDA 12.4 but Google Cloud Run...

[how-to-run-inference-cloud-run-gpu-vllm]:

Thanks for the rapid response and solution Much appreciated @saraford