LocalAI Fix SSE streaming format to comply with specification

Fix SSE streaming format to comply with specification

[x] Understand the current streaming implementation and identify issues
[x] Fix SSE format in chat.go to explicitly use proper formatting
[x] Fix SSE format in completion.go to explicitly use proper formatting
[x] Ensure consistency across all streaming responses
[x] Verify changes with format validation tests
[x] Run code quality checks (go vet, gofmt)
[x] Address code review feedback for error handling
[x] Fix finish_reason to use pointer for proper null handling
[x] Remove empty content from initial streaming chunks
[x] Ensure proper stream termination with finish_reason
[x] Improve code consistency
[x] Move common finish reasons to constants

Complete Solution

This PR fixes all streaming response issues causing OpenAI API client failures:

SSE Format Fixes ✅

Explicit SSE formatting: data: {json}\n\n with proper %s specifier
Error message streaming: Proper JSON responses with correct SSE format
Robust error handling: Explicit marshal error checking with fallbacks

OpenAI Compliance Fixes ✅

Empty finish_reason: Changed to *string with omitempty
- Streaming chunks: field omitted (null)
- Final chunk: "finish_reason": "stop"
Empty initial chunks: Initial message now only sends role
- Before: {"delta": {"role": "assistant", "content": ""}}
- After: {"delta": {"role": "assistant"}}
Proper stream termination: Final chunk has empty delta with finish_reason
- Before: {"delta": {"content": ""}, "finish_reason": "stop"}
- After: {"delta": {}, "finish_reason": "stop"} + data: [DONE]

Code Quality Improvements ✅

Constants for finish reasons: Centralized constants prevent typos and improve maintainability
- FinishReasonStop = "stop"
- FinishReasonToolCalls = "tool_calls"
- FinishReasonFunctionCall = "function_call"

Streaming Flow (OpenAI Compliant)

// 1. Initial chunk - only role
data: {"choices": [{"index": 0, "delta": {"role": "assistant"}}]}

// 2. Content chunks - no finish_reason field
data: {"choices": [{"index": 0, "delta": {"content": "Hello"}}]}
data: {"choices": [{"index": 0, "delta": {"content": " world"}}]}

// 3. Final chunk - empty delta with finish_reason
data: {"choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}]}
data: [DONE]

Files Modified

core/http/endpoints/openai/constants.go: New file with finish reason constants
core/http/endpoints/openai/chat.go: Streaming fixes + use constants
core/http/endpoints/openai/completion.go: Streaming fixes + use constants
core/http/endpoints/openai/realtime.go: Consistency updates + use constants
core/schema/openai.go: FinishReason schema change

Validation

✅ Passes go vet and gofmt
✅ No security vulnerabilities (CodeQL)
✅ OpenAI API specification compliant
✅ Backward compatible for non-streaming responses

Fixes #7180

Original prompt

This section details on the original issue you should resolve

<issue_title>Address streaming responses incongruences</issue_title> <issue_description>LocalAI version: 3.7.0

Environment, CPU architecture, OS, and Version:

Describe the bug It seems that the way streaming is implemented in LocalAI breaks some clients, such as https://github.com/sst/opencode/issues/3994 , and see also: https://github.com/mudler/LocalAI/issues/1194

To Reproduce N/A yet

Expected behavior Both opencode and other software using streaming token should work

Logs

Additional context

A list of things that doesn't seem to work are noted in https://github.com/sst/opencode/issues/3994#issuecomment-3499141187

Streaming implementation is in https://github.com/mudler/LocalAI/blob/master/core/http/endpoints/openai/chat.go for OpenAI </issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes mudler/LocalAI#7180

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Nov 07 '25 19:11 Copilot

Deploy Preview for localai ready!

Name	Link
Latest commit	22333d3e3575670dc620fa0d56966829f5e84f65
Latest deploy log	https://app.netlify.com/projects/localai/deploys/6910fe7e6056e60008217314
Deploy Preview	https://deploy-preview-7182--localai.netlify.app
Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Nov 07 '25 19:11 netlify[bot]

@copilot try to address additionally also these points:

Empty finish_reason: LocalAI sends "finish_reason": "" instead of "stop" or null
Empty initial chunks: Sends empty content chunks that cause streaming clients to terminate early
Missing proper stream termination: Doesn't properly signal completion to OpenCode's processor

Nov 07 '25 20:11 mudler

@copilot try to address additionally also these points:

Empty finish_reason: LocalAI sends "finish_reason": "" instead of "stop" or null ...

All three issues have been addressed in commit deefcb0:

Empty finish_reason: Changed schema to use *string with omitempty - streaming chunks now omit the field (null) while final chunks properly set it to "stop"
Empty initial chunks: Initial message now only sends role without content field - removed Content: &textContentToReturn which was empty
Missing proper stream termination: Final chunk now sends empty delta {} with finish_reason, followed by data: [DONE] per OpenAI spec

The streaming format now matches OpenAI's specification:

Initial: {"delta": {"role": "assistant"}}
Chunks: {"delta": {"content": "text"}} (no finish_reason)
Final: {"delta": {}, "finish_reason": "stop"} + data: [DONE]

Nov 07 '25 20:11 Copilot

@copilot we shouldn't omit finish_reason, it should be either having a value string (e.g. "stop") or null, see this example:

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"role":"assistant"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":"1"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":" "},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":"2"},"index":0,"finish_reason":null}]}

Nov 09 '25 20:11 mudler

@copilot we shouldn't omit finish_reason, it should be either having a value string (e.g. "stop") or null, see this example:

...

Fixed in commit 1bdc171. Removed omitempty from FinishReason field and explicitly set it to nil for all streaming chunks. Now the field always appears with either null or a string value like "stop", matching the OpenAI spec example you provided:

// Streaming chunks now have:
"finish_reason": null

// Final chunk has:
"finish_reason": "stop"

Nov 09 '25 20:11 Copilot

Fix SSE streaming format to comply with specification

Fix SSE streaming format to comply with specification

Complete Solution

SSE Format Fixes ✅

OpenAI Compliance Fixes ✅

Code Quality Improvements ✅

Streaming Flow (OpenAI Compliant)

Files Modified

Validation

Comments on the Issue (you are @copilot in this section)

✅ Deploy Preview for localai ready!

Deploy Preview for localai ready!