Marut Pandya
Results
14
issues of
Marut Pandya
Note: You can only load one model at a time, Hence in quick deploy only single model is assigned.
I've created a custom Docker image `runpod/worker-v1-vllm:v2.8.0gptoss-cuda12.8.1` that allows us deploy the openai/gpt-oss-* model on RunPod Serverless. This is an experimental release and subject to change as vLLM adds full...