kubectl-ai stream model will report simulated streaming failed during non-streaming call

as below `[root@k8s-test ~]# kubectl-ai --llm-provider=openai --model=qwen3-235b-a22b

Hey there, what can I help you with today?

show me all deployment in the deafult namespace E0508 11:30:21.562983 32248 openai.go:249] OpenAI ChatCompletion API error: POST "https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions": 400 Bad Request {"code":"invalid_parameter_error","param":null,"message":"This model only support stream mode, please enable the stream parameter to access the model. ","type":"invalid_request_error"} Error: simulated streaming failed during non-streaming call: OpenAI chat completion failed: POST "https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions": 400 Bad Request {"code":"invalid_parameter_error","param":null,"message":"This model only support stream mode, please enable the stream parameter to access the model. ","type":"invalid_request_error"}

exit [root@k8s-test ~]# kubectl-ai --llm-provider=openai --model=qwen-plus

Hey there, what can I help you with today?

show me all deployment in the deafult namespace Running: kubectl get deployments -n deafult`

May 08 '25 03:05 limboys

Thanks @limboys for reporting the issue.

I understand we are using openai compatible llm-provider here, but what is the underlying provider here ? (alibaba ?)

Looks like we might have to provide a command line configuration to toggle stream mode. I am not sure if models API expose this configuration otherwise smart thing would have been to detect it on the basis of model configuration.

/cc @tuannvm @hakman @justinsb

May 08 '25 05:05 droot

I meet the same problem Error: simulated streaming failed during non-streaming call

May 08 '25 06:05 zackertypical

The issue is that the model supports streaming mode only

https://www.alibabacloud.com/help/en/model-studio/stream

Open source Qwen3, Models, and Models models only support streaming output.

The openai.go implementation does not have streaming enabled

https://github.com/GoogleCloudPlatform/kubectl-ai/blob/main/gollm/openai.go#L122-L129

Perhaps the best option is to move this into its own llm file as it was done for grok and add streaming capabilities for newer models:

chatReq := openai.ChatCompletionNewParams{
	Model: openai.ChatModel(req.Model),
	Messages: []openai.ChatCompletionMessageParamUnion{
		openai.UserMessage(req.Prompt),
	},
	Stream: true, // Required for Qwen3-235B and other stream-only models
}

While also allowing non-streaming for older models, if you can, switch to an older model and try that as a workaround.

Hope this helps!

May 08 '25 07:05 zvdy

Hmm, I wonder if it would be possible to provide a flag to control the addition of extra parameters or a separate stream flag to enable or disable streaming requests. such as kubectl-ai --llm-provider=openai --model=qwen3-235b-a22b --stream=true

May 08 '25 07:05 limboys

What if we could add logic to determine whether the models support streaming? The flag seems reasonable to me, but it wouldn’t provide the best user experience, since users would need to know which models support streaming. In my opinion, the tool should fail fast and notify the user before entering prompt mode.

May 08 '25 23:05 tuannvm

@droot I can take this

May 09 '25 18:05 tuannvm

@droot I can take this

Awesome. Go for it. Thanks.

May 09 '25 19:05 droot

What if we could add logic to determine whether the models support streaming? The flag seems reasonable to me, but it wouldn’t provide the best user experience, since users would need to know which models support streaming. In my opinion, the tool should fail fast and notify the user before entering prompt mode.

While this would be interesting, this means we need to keep track of the models ourselves(?), also some models like OpenAI ones have streaming as a feature, doc Anyhow, streaming should be a feature in order to cover all models either we implement automatic model detection or allow a --streaming True/False flag

May 09 '25 19:05 zvdy

Agree with @zvdy , streaming as a feature will be needed anyways.

Remember our goal here is to unblock users to be able to use whatever model meets their need best (many factors goes into why a model is best for them...).

Acceptable UX:

Add streaming as a feature in the gollm layer and add a command line flag (--use-streaming) to enable/disable feature (kubectl-ai and k8s-bench CLIs both). [Note we had to do this for tool-use, that's why --enable-tool-use flag exists today]. kubectl-ai` can detect invalid stream mode use (can do on the basis of error signature) and can suggest the user a workaround to invoke the tool with a command line flag. (We can also document)

Note if supporting this flag for all the llm-providers turns out to be tricky, we can introduce llm-specific flag as well if that keeps the implementation simple. It will depend on how the implementation look like.

Auto-detecting Tracking model capabilities ourselves is going to be maintenance nightmare and will increase toil on us maintainers, so not desirable. I have doubts that llm-providers expose APIs to discover this capability easily but I could be wrong.

May 09 '25 20:05 droot

@droot @zvdy please take a look https://github.com/GoogleCloudPlatform/kubectl-ai/pull/209

May 10 '25 16:05 tuannvm

@limboys @zackertypical @tuannvm can you pl. help test this PR out https://github.com/GoogleCloudPlatform/kubectl-ai/pull/215 and confirm if this solve your issue.

Thank you such.

May 12 '25 16:05 droot

how to build? @droot

May 13 '25 07:05 zackertypical

Still working on this, for some reason, openai tool_calls differ from grok ones, the arguments are splitted and chunked while works for grok models with streaming, it does not for openai models.

Regarding the build, you would need to git clone my branch referenced in #215 where the PR was emitted from and go build from source there

Anyhow, i will come back to this issue if I manage to fix and test the solution for that specifically!

May 13 '25 18:05 zvdy

Finished the implementation, tested supported models for qwen openai etc, all work, feel free to test it #215

there are a few caveats about qwen and alibabacloud, point to the correct endpoint and have look at https://github.com/GoogleCloudPlatform/kubectl-ai/pull/215#issuecomment-2870598371

@zackertypical @limboys

May 15 '25 19:05 zvdy

Awesome. Thanks @zvdy if everything looks good, we can cut a release out today.

May 15 '25 19:05 droot