worker-vllm icon indicating copy to clipboard operation
worker-vllm copied to clipboard

[EXPERIMENTAL] Deploy openai/gpt-oss-* on runpod serverless.

Open pandyamarut opened this issue 4 months ago • 3 comments

I've created a custom Docker image runpod/worker-v1-vllm:v2.8.0gptoss-cuda12.8.1 that allows us deploy the openai/gpt-oss-* model on RunPod Serverless. This is an experimental release and subject to change as vLLM adds full support for future versions.

To deploy, simply use this image in your RunPod Serverless endpoint with the environment variable MODEL_NAME=openai/gpt-oss-20b and ensure you have adequate GPU memory and are using supported GPU type.

For detailed information about the gpt-oss model and its capabilities, please refer to the official vLLM guide: https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html

pandyamarut avatar Aug 12 '25 00:08 pandyamarut

does this supports the mxfp4 quantization too?

nerdylive123 avatar Aug 14 '25 04:08 nerdylive123

Does it works for anyone ? It didn't run on a l40s or h100 for me

HoMi264 avatar Aug 22 '25 16:08 HoMi264

#220

Permafacture avatar Oct 12 '25 19:10 Permafacture