[EXPERIMENTAL] Deploy openai/gpt-oss-* on runpod serverless.

Open pandyamarut opened this issue 4 months ago • 3 comments

I've created a custom Docker image runpod/worker-v1-vllm:v2.8.0gptoss-cuda12.8.1 that allows us deploy the openai/gpt-oss-* model on RunPod Serverless. This is an experimental release and subject to change as vLLM adds full support for future versions.

To deploy, simply use this image in your RunPod Serverless endpoint with the environment variable MODEL_NAME=openai/gpt-oss-20b and ensure you have adequate GPU memory and are using supported GPU type.

For detailed information about the gpt-oss model and its capabilities, please refer to the official vLLM guide: https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html

Aug 12 '25 00:08 pandyamarut

does this supports the mxfp4 quantization too?

Aug 14 '25 04:08 nerdylive123

Does it works for anyone ? It didn't run on a l40s or h100 for me

Aug 22 '25 16:08 HoMi264

#220

Oct 12 '25 19:10 Permafacture