worker-vllm Unable to run gpt

A serverless worker can't start with openai/gpt-oss-20b model

Value error, The checkpoint you are trying to load has model type `gpt_oss` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.\n

The fix for this error is already listed on the openai's gpt-oss page.

TL;DR: !pip install -U transformers>=4.55.0 kernels torch==2.6.0

Aug 07 '25 05:08 hoblin

It's a bit more complicated than that. The fix you quote is for using gpt-oss-* within google Colab, to run a gpt-oss-* model in vLLM, you need to use this feature branch. I suspect this template won't see gpt-oss-* support until that vLLM feature branch is merged upstream.

Aug 11 '25 20:08 Staberinde

Thanks. I'm already looking into building a custom worker based on llama-server, but that's not a priority for me at the moment. My test endpoint is running on stripped ollama-worker, and until I need better support for JSON schema and function calls, ollama will do the job. At least on the POC stage, which is my case =)

Aug 11 '25 21:08 hoblin

https://github.com/runpod-workers/worker-vllm/issues/210 @hoblin @Staberinde . Let me know, if you face any issues. I can take a look. Thanks.

Aug 12 '25 00:08 pandyamarut

The feature branch cited above has been merged and vLLM generally supports GPT-OSS. Would be nice if the official image was updated so we could get this and any other improvements in the past 2 months through the widget in the serverless UI

Oct 06 '25 19:10 Permafacture

#220

Oct 12 '25 19:10 Permafacture

Unable to run gpt_oss model type