LocalAI icon indicating copy to clipboard operation
LocalAI copied to clipboard

Fix SSE streaming format to comply with specification

Open Copilot opened this issue 2 months ago • 3 comments

Fix SSE streaming format to comply with specification

  • [x] Understand the current streaming implementation and identify issues
  • [x] Fix SSE format in chat.go to explicitly use proper formatting
  • [x] Fix SSE format in completion.go to explicitly use proper formatting
  • [x] Ensure consistency across all streaming responses
  • [x] Verify changes with format validation tests
  • [x] Run code quality checks (go vet, gofmt)
  • [x] Address code review feedback for error handling
  • [x] Fix finish_reason to use pointer for proper null handling
  • [x] Remove empty content from initial streaming chunks
  • [x] Ensure proper stream termination with finish_reason
  • [x] Improve code consistency
  • [x] Move common finish reasons to constants

Complete Solution

This PR fixes all streaming response issues causing OpenAI API client failures:

SSE Format Fixes ✅

  1. Explicit SSE formatting: data: {json}\n\n with proper %s specifier
  2. Error message streaming: Proper JSON responses with correct SSE format
  3. Robust error handling: Explicit marshal error checking with fallbacks

OpenAI Compliance Fixes ✅

  1. Empty finish_reason: Changed to *string with omitempty

    • Streaming chunks: field omitted (null)
    • Final chunk: "finish_reason": "stop"
  2. Empty initial chunks: Initial message now only sends role

    • Before: {"delta": {"role": "assistant", "content": ""}}
    • After: {"delta": {"role": "assistant"}}
  3. Proper stream termination: Final chunk has empty delta with finish_reason

    • Before: {"delta": {"content": ""}, "finish_reason": "stop"}
    • After: {"delta": {}, "finish_reason": "stop"} + data: [DONE]

Code Quality Improvements ✅

  1. Constants for finish reasons: Centralized constants prevent typos and improve maintainability
    • FinishReasonStop = "stop"
    • FinishReasonToolCalls = "tool_calls"
    • FinishReasonFunctionCall = "function_call"

Streaming Flow (OpenAI Compliant)

// 1. Initial chunk - only role
data: {"choices": [{"index": 0, "delta": {"role": "assistant"}}]}

// 2. Content chunks - no finish_reason field
data: {"choices": [{"index": 0, "delta": {"content": "Hello"}}]}
data: {"choices": [{"index": 0, "delta": {"content": " world"}}]}

// 3. Final chunk - empty delta with finish_reason
data: {"choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}]}
data: [DONE]

Files Modified

  • core/http/endpoints/openai/constants.go: New file with finish reason constants
  • core/http/endpoints/openai/chat.go: Streaming fixes + use constants
  • core/http/endpoints/openai/completion.go: Streaming fixes + use constants
  • core/http/endpoints/openai/realtime.go: Consistency updates + use constants
  • core/schema/openai.go: FinishReason schema change

Validation

✅ Passes go vet and gofmt
✅ No security vulnerabilities (CodeQL)
✅ OpenAI API specification compliant
✅ Backward compatible for non-streaming responses

Fixes #7180

Original prompt

This section details on the original issue you should resolve

<issue_title>Address streaming responses incongruences</issue_title> <issue_description>LocalAI version: 3.7.0

Environment, CPU architecture, OS, and Version:

Describe the bug It seems that the way streaming is implemented in LocalAI breaks some clients, such as https://github.com/sst/opencode/issues/3994 , and see also: https://github.com/mudler/LocalAI/issues/1194

To Reproduce N/A yet

Expected behavior Both opencode and other software using streaming token should work

Logs

Additional context

  • A list of things that doesn't seem to work are noted in https://github.com/sst/opencode/issues/3994#issuecomment-3499141187
  • Streaming implementation is in https://github.com/mudler/LocalAI/blob/master/core/http/endpoints/openai/chat.go for OpenAI </issue_description>

Comments on the Issue (you are @copilot in this section)

  • Fixes mudler/LocalAI#7180

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot avatar Nov 07 '25 19:11 Copilot

Deploy Preview for localai ready!

Name Link
Latest commit 22333d3e3575670dc620fa0d56966829f5e84f65
Latest deploy log https://app.netlify.com/projects/localai/deploys/6910fe7e6056e60008217314
Deploy Preview https://deploy-preview-7182--localai.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

netlify[bot] avatar Nov 07 '25 19:11 netlify[bot]

@copilot try to address additionally also these points:

  • Empty finish_reason: LocalAI sends "finish_reason": "" instead of "stop" or null
  • Empty initial chunks: Sends empty content chunks that cause streaming clients to terminate early
  • Missing proper stream termination: Doesn't properly signal completion to OpenCode's processor

mudler avatar Nov 07 '25 20:11 mudler

@copilot try to address additionally also these points:

  • Empty finish_reason: LocalAI sends "finish_reason": "" instead of "stop" or null ...

All three issues have been addressed in commit deefcb0:

  1. Empty finish_reason: Changed schema to use *string with omitempty - streaming chunks now omit the field (null) while final chunks properly set it to "stop"

  2. Empty initial chunks: Initial message now only sends role without content field - removed Content: &textContentToReturn which was empty

  3. Missing proper stream termination: Final chunk now sends empty delta {} with finish_reason, followed by data: [DONE] per OpenAI spec

The streaming format now matches OpenAI's specification:

  • Initial: {"delta": {"role": "assistant"}}
  • Chunks: {"delta": {"content": "text"}} (no finish_reason)
  • Final: {"delta": {}, "finish_reason": "stop"} + data: [DONE]

Copilot avatar Nov 07 '25 20:11 Copilot

@copilot we shouldn't omit finish_reason, it should be either having a value string (e.g. "stop") or null, see this example:

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"role":"assistant"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":"1"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":" "},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-blablabla","object":"chat.completion.chunk","created":1684665888,"model":"gpt-4-0314","choices":[{"delta":{"content":"2"},"index":0,"finish_reason":null}]}

mudler avatar Nov 09 '25 20:11 mudler

@copilot we shouldn't omit finish_reason, it should be either having a value string (e.g. "stop") or null, see this example:

...

Fixed in commit 1bdc171. Removed omitempty from FinishReason field and explicitly set it to nil for all streaming chunks. Now the field always appears with either null or a string value like "stop", matching the OpenAI spec example you provided:

// Streaming chunks now have:
"finish_reason": null

// Final chunk has:
"finish_reason": "stop"

Copilot avatar Nov 09 '25 20:11 Copilot