Fix SSE streaming format to comply with specification
Fix SSE streaming format to comply with specification
- [x] Understand the current streaming implementation and identify issues
- [x] Fix SSE format in chat.go to explicitly use proper formatting
- [x] Fix SSE format in completion.go to explicitly use proper formatting
- [x] Ensure consistency across all streaming responses
- [x] Verify changes with format validation tests
- [x] Run code quality checks (go vet, gofmt)
- [x] Address code review feedback for error handling
- [x] Fix finish_reason to use pointer for proper null handling
- [x] Remove empty content from initial streaming chunks
- [x] Ensure proper stream termination with finish_reason
- [x] Improve code consistency
- [x] Move common finish reasons to constants
Complete Solution
This PR fixes all streaming response issues causing OpenAI API client failures:
SSE Format Fixes ✅
-
Explicit SSE formatting:
data: {json}\n\nwith proper%sspecifier - Error message streaming: Proper JSON responses with correct SSE format
- Robust error handling: Explicit marshal error checking with fallbacks
OpenAI Compliance Fixes ✅
-
Empty finish_reason: Changed to
*stringwithomitempty- Streaming chunks: field omitted (null)
- Final chunk:
"finish_reason": "stop"
-
Empty initial chunks: Initial message now only sends role
- Before:
{"delta": {"role": "assistant", "content": ""}} - After:
{"delta": {"role": "assistant"}}
- Before:
-
Proper stream termination: Final chunk has empty delta with finish_reason
- Before:
{"delta": {"content": ""}, "finish_reason": "stop"} - After:
{"delta": {}, "finish_reason": "stop"}+data: [DONE]
- Before:
Code Quality Improvements ✅
-
Constants for finish reasons: Centralized constants prevent typos and improve maintainability
-
FinishReasonStop = "stop" -
FinishReasonToolCalls = "tool_calls" -
FinishReasonFunctionCall = "function_call"
-
Streaming Flow (OpenAI Compliant)
// 1. Initial chunk - only role
data: {"choices": [{"index": 0, "delta": {"role": "assistant"}}]}
// 2. Content chunks - no finish_reason field
data: {"choices": [{"index": 0, "delta": {"content": "Hello"}}]}
data: {"choices": [{"index": 0, "delta": {"content": " world"}}]}
// 3. Final chunk - empty delta with finish_reason
data: {"choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}]}
data: [DONE]
Files Modified
-
core/http/endpoints/openai/constants.go: New file with finish reason constants -
core/http/endpoints/openai/chat.go: Streaming fixes + use constants -
core/http/endpoints/openai/completion.go: Streaming fixes + use constants -
core/http/endpoints/openai/realtime.go: Consistency updates + use constants -
core/schema/openai.go: FinishReason schema change
Validation
✅ Passes go vet and gofmt
✅ No security vulnerabilities (CodeQL)
✅ OpenAI API specification compliant
✅ Backward compatible for non-streaming responses
Fixes #7180
Original prompt
This section details on the original issue you should resolve
<issue_title>Address streaming responses incongruences</issue_title> <issue_description>LocalAI version: 3.7.0
Environment, CPU architecture, OS, and Version:
Describe the bug It seems that the way streaming is implemented in LocalAI breaks some clients, such as https://github.com/sst/opencode/issues/3994 , and see also: https://github.com/mudler/LocalAI/issues/1194
To Reproduce N/A yet
Expected behavior Both opencode and other software using streaming token should work
Logs
Additional context
- A list of things that doesn't seem to work are noted in https://github.com/sst/opencode/issues/3994#issuecomment-3499141187
- Streaming implementation is in https://github.com/mudler/LocalAI/blob/master/core/http/endpoints/openai/chat.go for OpenAI </issue_description>
Comments on the Issue (you are @copilot in this section)
- Fixes mudler/LocalAI#7180
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.
Deploy Preview for localai ready!
| Name | Link |
|---|---|
| Latest commit | 22333d3e3575670dc620fa0d56966829f5e84f65 |
| Latest deploy log | https://app.netlify.com/projects/localai/deploys/6910fe7e6056e60008217314 |
| Deploy Preview | https://deploy-preview-7182--localai.netlify.app |
| Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify project configuration.
@copilot try to address additionally also these points:
- Empty finish_reason: LocalAI sends "finish_reason": "" instead of "stop" or null
- Empty initial chunks: Sends empty content chunks that cause streaming clients to terminate early
- Missing proper stream termination: Doesn't properly signal completion to OpenCode's processor
@copilot try to address additionally also these points:
- Empty finish_reason: LocalAI sends "finish_reason": "" instead of "stop" or null ...
All three issues have been addressed in commit deefcb0:
-
Empty finish_reason: Changed schema to use
*stringwithomitempty- streaming chunks now omit the field (null) while final chunks properly set it to "stop" -
Empty initial chunks: Initial message now only sends role without content field - removed
Content: &textContentToReturnwhich was empty -
Missing proper stream termination: Final chunk now sends empty delta
{}with finish_reason, followed bydata: [DONE]per OpenAI spec
The streaming format now matches OpenAI's specification:
- Initial:
{"delta": {"role": "assistant"}} - Chunks:
{"delta": {"content": "text"}}(no finish_reason) - Final:
{"delta": {}, "finish_reason": "stop"}+data: [DONE]
@copilot we shouldn't omit finish_reason, it should be either having a value string (e.g. "stop") or null, see this example:
data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"role":"assistant"},"index":0,"finish_reason":null}]}
data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":"1"},"index":0,"finish_reason":null}]}
data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":" "},"index":0,"finish_reason":null}]}
data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":"2"},"index":0,"finish_reason":null}]}
@copilot we shouldn't omit finish_reason, it should be either having a value string (e.g. "stop") or null, see this example:
...
Fixed in commit 1bdc171. Removed omitempty from FinishReason field and explicitly set it to nil for all streaming chunks. Now the field always appears with either null or a string value like "stop", matching the OpenAI spec example you provided:
// Streaming chunks now have:
"finish_reason": null
// Final chunk has:
"finish_reason": "stop"