Add --no-stream flag for reasoning models

Open gwpl opened this issue 3 months ago • 1 comments

AI Assistant:

Summary

This PR adds a minimal --no-stream CLI flag to disable response streaming, resolving issues with reasoning models (o1, o3, o3-pro) that require organization verification for streaming or don't support streaming on certain platforms (Azure OpenAI).

Fixes #430

Problem

Currently, mods hardcodes streaming for all API calls, which causes 400 errors for:

o3/o3-mini/o3-pro models without org verification: "param": "stream", "code": "unsupported_value"
o1 models on Azure OpenAI (streaming not supported)
Any reasoning model with streaming restrictions

Solution

Add --no-stream flag that:

Disables streaming when set
Uses regular Chat Completion API instead of streaming API
Maintains backward compatibility (streaming enabled by default)
Implements NonStreamingWrapper to provide consistent stream.Stream interface

Changes

1. config.go (2 changes)

Add NoStream bool field to Config struct (line 148)
Add help text for --no-stream flag (line 47)

2. main.go (1 change)

3. internal/proto/proto.go (1 change)

Add Stream *bool field to Request struct (line 78)

4. mods.go (1 change)

Pass NoStream config to request (lines 440-443)

5. internal/openai/openai.go (2 changes)

Implement conditional streaming logic (lines 96-106)
Add NonStreamingWrapper struct implementing stream.Stream (lines 197-254)

Testing

Basic functionality tests

# Non-streaming mode works
echo "what is 2+2?" | ./mods -m gpt-4o-mini --no-stream
# Output: 2 + 2 equals 4.

# Streaming mode still works (default)
echo "what is 3+3?" | ./mods -m gpt-4o-mini
# Output: 3 + 3 equals 6.

# Help text includes new flag
./mods --help | grep no-stream
# Output: --no-stream  Disable streaming of responses (useful for reasoning models without streaming support)

Real-world o3 model test (proves the fix works)

# WITHOUT --no-stream: FAILS with streaming error
echo "test" | ./mods -m o3
# ERROR: 400 Bad Request
# "message": "Your organization must be verified to stream this model..."
# "param": "stream"
# "code": "unsupported_value"

# WITH --no-stream: WORKS!
echo "test" | ./mods -m o3 --no-stream
# Output: Hello! It looks like you're testing the chat. How can I help you today?
# Conversation saved: 6f469bf test

This proves the --no-stream flag successfully resolves issue #430 for o3/o3-pro reasoning models.

Scope & Future Work

This PR implements ONLY the CLI flag. Per-model configuration in config_template.yml is intentionally left for future work to keep this change minimal and focused on resolving the immediate issue.

Future enhancements could include:

Per-model stream: false configuration in config files
Auto-detection of models that don't support streaming
Azure OpenAI specific handling

Backward Compatibility

✅ Fully backward compatible:

Default behavior unchanged (streaming enabled)
Existing scripts/workflows continue to work
New flag is opt-in only

🤖 Generated with Claude Code

Oct 01 '25 21:10 gwpl

AI Assistant:

CI Lint Status Update

The macOS lint failure is due to a pre-existing issue in the upstream codebase, not introduced by this PR.

Error: os/exec.Command must not be called. use os/exec.CommandContext (noctx) at mods.go:485

Analysis:

This error is at line 485 of mods.go
The --no-stream implementation only modifies lines 440-443 of mods.go
Line 485 is unrelated to the streaming changes
This noctx violation exists in the base branch

Files actually modified by this PR:

config.go (lines 47, 148)
main.go (line 270)
internal/proto/proto.go (line 78)
mods.go (lines 440-443)
internal/openai/openai.go (lines 96-254)

Build Status: ✅ All build checks pass on ubuntu, macos, and windows

Testing: Successfully tested with o3 model:

WITHOUT --no-stream: 400 error (streaming not allowed)
WITH --no-stream: Works successfully ✅

Request: The maintainers may want to address the pre-existing noctx issue separately, or accept this PR despite the lint warning since it's unrelated to the changes introduced here.

Oct 02 '25 16:10 gwpl