[EXPERIMENTAL] Deploy openai/gpt-oss-* on runpod serverless.
I've created a custom Docker image runpod/worker-v1-vllm:v2.8.0gptoss-cuda12.8.1 that allows us deploy the openai/gpt-oss-* model on RunPod Serverless. This is an experimental release and subject to change as vLLM adds full support for future versions.
To deploy, simply use this image in your RunPod Serverless endpoint with the environment variable MODEL_NAME=openai/gpt-oss-20b and ensure you have adequate GPU memory and are using supported GPU type.
For detailed information about the gpt-oss model and its capabilities, please refer to the official vLLM guide: https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html
does this supports the mxfp4 quantization too?
Does it works for anyone ? It didn't run on a l40s or h100 for me
#220