dify
dify copied to clipboard
Not support sse(server side event)
Self Checks
- [X] This is only for bug report, if you would like to ask a quesion, please head to Discussions.
- [X] I have searched for existing issues search for existing issues, including closed ones.
- [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [X] Pleas do not modify this template :) and fill in all the required fields.
Dify version
0.6.6
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
I deployed an openai-api-compatible server which is sse mode, and add a line code print(f'decoded_chunk: {decoded_chunk}')
under this line https://github.com/langgenius/dify/blob/0.6.6/api/core/model_runtime/model_providers/openai_api_compatible/llm/llm.py#L433, and then call its api with the following script:
import os
from collections.abc import Generator
import pytest
from core.model_runtime.entities.llm_entities import LLMResult, LLMResultChunk, LLMResultChunkDelta
from core.model_runtime.entities.message_entities import (
AssistantPromptMessage,
PromptMessageTool,
SystemPromptMessage,
UserPromptMessage,
)
from core.model_runtime.errors.validate import CredentialsValidateFailedError
from core.model_runtime.model_providers.openai_api_compatible.llm.llm import OAIAPICompatLargeLanguageModel
"""
Using Together.ai's OpenAI-compatible API as testing endpoint
"""
def func():
model = OAIAPICompatLargeLanguageModel()
response = model.invoke(
model='qwen2_14b_kv_v3',
credentials={
'api_key': os.environ.get('TOGETHER_API_KEY'),
'endpoint_url': 'http://172.16.11.242:8080/v1/',
'mode': 'chat',
'stream_mode_delimiter': '\\n\\n'
},
prompt_messages=[
SystemPromptMessage(
content='You are a helpful AI assistant.',
),
UserPromptMessage(
content='Who are you?'
)
],
model_parameters={
'temperature': 1.0,
'top_k': 2,
'top_p': 0.5,
},
stop=['How'],
stream=True,
user="abc-123"
)
for chunk in response:
print(chunk)
if __name__ == '__main__':
func()
✔️ Expected Behavior
All the prefix data:
of all chunks is stripped.
❌ Actual Behavior
Only the first chunk's data:
is stripped. And the json str is failed to be decoded.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
/home/vscode/.local/lib/python3.10/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
decoded_chunk: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"I"},"finish_reason":null}]}
data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" am"},"finish_reason":null}]}
data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" a"},"finish_reason":null}]}
data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" large"},"finish_reason":null}]}
data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" language"},"finish_reason":null}]}
data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" model"},"finish_reason":null}]}
data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" created"},"finish_reason":null}]}
data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" by"},"finish_reason":null}]}
data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" Alibaba"},"finish_reason":null}]}
data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" Cloud"},"finish_reason":null}]}
data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"."},"finish_reason":null}]}
data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" I"},"finish_reason":null}]}
data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" am"},"finish_reason":null}]}
data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" called"},"finish_reason":null}]}
data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" Q"},"finish_reason":null}]}
data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"wen"},"finish_reason":null}]}
data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"."},"finish_reason":null}]}
data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"\",\"model_name\":\"qwen2_14b_kv_v3"},"finish_reason":null}]}
data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
model='qwen2_14b_kv_v3' prompt_messages=[SystemPromptMessage(role=<PromptMessageRole.SYSTEM: 'system'>, content='You are a helpful AI assistant.', name=None), UserPromptMessage(role=<PromptMessageRole.USER: 'user'>, content='Who are you?', name=None)] system_fingerprint=None delta=LLMResultChunkDelta(index=1, message=AssistantPromptMessage(role=<PromptMessageRole.ASSISTANT: 'assistant'>, content='', name=None, tool_calls=[]), usage=LLMUsage(prompt_tokens=7, prompt_unit_price=Decimal('0'), prompt_price_unit=Decimal('0'), prompt_price=Decimal('0E-7'), completion_tokens=0, completion_unit_price=Decimal('0'), completion_price_unit=Decimal('0'), completion_price=Decimal('0E-7'), total_tokens=7, total_price=Decimal('0E-7'), currency='USD', latency=7.4800044149160385), finish_reason='Non-JSON encountered.')
model='qwen2_14b_kv_v3' prompt_messages=[SystemPromptMessage(role=<PromptMessageRole.SYSTEM: 'system'>, content='You are a helpful AI assistant.', name=None), UserPromptMessage(role=<PromptMessageRole.USER: 'user'>, content='Who are you?', name=None)] system_fingerprint=None delta=LLMResultChunkDelta(index=0, message=AssistantPromptMessage(role=<PromptMessageRole.ASSISTANT: 'assistant'>, content='', name=None, tool_calls=[]), usage=LLMUsage(prompt_tokens=7, prompt_unit_price=Decimal('0'), prompt_price_unit=Decimal('0'), prompt_price=Decimal('0E-7'), completion_tokens=0, completion_unit_price=Decimal('0'), completion_price_unit=Decimal('0'), completion_price=Decimal('0E-7'), total_tokens=7, total_price=Decimal('0E-7'), currency='USD', latency=7.480802866164595), finish_reason='Unknown')
It appears the issue you're encountering with server-side events (SSE) in Dify version 0.6.6, where only the first chunk's data:
prefix is stripped, can be addressed by focusing on the handling of streamed responses. The core of the solution involves ensuring that each chunk received from the SSE stream is correctly processed to strip the data:
prefix and that the JSON data is properly decoded.
To resolve this, you should look into the handling of SSE in the SSELineParser
class and the Event
class's json_data
method to ensure they correctly process the streamed data. Specifically, verify that the SSELineParser
correctly accumulates data lines for each event and that the Event
class properly prepares the data for JSON decoding. This might involve ensuring that data split across multiple chunks or events is correctly concatenated before decoding.
Additionally, review the StreamResponse
class to ensure it correctly handles each event and that the data passed to _data_process_func
is correctly formatted for JSON decoding. If the JSON data is not correctly formatted or if there are issues in how the data:
prefix is handled, it could lead to the decoding failures you've observed.
If modifications are needed, they should focus on enhancing the event parsing and data processing flow to handle the data:
prefix correctly across all chunks and ensure the JSON data is valid before attempting to decode it.
This approach should help address the issue with SSE support in Dify, ensuring that all chunks are correctly processed and that JSON decoding succeeds for streamed responses.
Details
Sources
- api/core/model_runtime/model_providers/openai_api_compatible/llm/llm.py
- api/core/model_runtime/model_providers/zhipuai/zhipuai_sdk/core/_sse_client.py
- web/service/base.ts
- api/libs/json_in_md_parser.py
- api/core/model_runtime/model_providers/moonshot/llm/llm.py
- api/core/splitter/text_splitter.py
- en/features/workflow/preview-and-run/preview-and-run.md
- en/features/extension/api_based_extension/README.md
- en/tutorials/advanced-tool-integration.md
About Dosu This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
To continue the conversation, mention @dosu.
According to the OpenAI API specifications, the results returned in a streaming fashion are not SSE, but rather a stream of one JSON object per line. Check out the documentation here: https://platform.openai.com/docs/api-reference/chat/create
According to the OpenAI API specifications, the results returned in a streaming fashion are not SSE, but rather a stream of one JSON object per line. Check out the documentation here: https://platform.openai.com/docs/api-reference/chat/create
I replace the EventSourceResponse
with StreamingResponse
and the dify openai-api-compatible llm client can correctly pasrse the response.
from
from sse_starlette.sse import EventSourceResponse
to
from fastapi.responses import StreamingResponse
@zengqingfu1442 Could you submit a PR for this.
@zengqingfu1442 Could you submit a PR for this.
i have no idea about how to fix this.