dify icon indicating copy to clipboard operation
dify copied to clipboard

Not support sse(server side event)

Open zengqingfu1442 opened this issue 9 months ago • 4 comments

Self Checks

  • [X] This is only for bug report, if you would like to ask a quesion, please head to Discussions.
  • [X] I have searched for existing issues search for existing issues, including closed ones.
  • [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [X] Pleas do not modify this template :) and fill in all the required fields.

Dify version

0.6.6

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

I deployed an openai-api-compatible server which is sse mode, and add a line code print(f'decoded_chunk: {decoded_chunk}') under this line https://github.com/langgenius/dify/blob/0.6.6/api/core/model_runtime/model_providers/openai_api_compatible/llm/llm.py#L433, and then call its api with the following script:

import os
from collections.abc import Generator

import pytest

from core.model_runtime.entities.llm_entities import LLMResult, LLMResultChunk, LLMResultChunkDelta
from core.model_runtime.entities.message_entities import (
    AssistantPromptMessage,
    PromptMessageTool,
    SystemPromptMessage,
    UserPromptMessage,
)
from core.model_runtime.errors.validate import CredentialsValidateFailedError
from core.model_runtime.model_providers.openai_api_compatible.llm.llm import OAIAPICompatLargeLanguageModel

"""
Using Together.ai's OpenAI-compatible API as testing endpoint
"""


def func():
    model = OAIAPICompatLargeLanguageModel()

    response = model.invoke(
        model='qwen2_14b_kv_v3',
        credentials={
            'api_key': os.environ.get('TOGETHER_API_KEY'),
            'endpoint_url': 'http://172.16.11.242:8080/v1/',
            'mode': 'chat',
            'stream_mode_delimiter': '\\n\\n'
        },
        prompt_messages=[
            SystemPromptMessage(
                content='You are a helpful AI assistant.',
            ),
            UserPromptMessage(
                content='Who are you?'
            )
        ],
        model_parameters={
            'temperature': 1.0,
            'top_k': 2,
            'top_p': 0.5,
        },
        stop=['How'],
        stream=True,
        user="abc-123"
    )
    for chunk in response:
        print(chunk)


if __name__ == '__main__':
    func()

✔️ Expected Behavior

All the prefix data: of all chunks is stripped.

❌ Actual Behavior

Only the first chunk's data: is stripped. And the json str is failed to be decoded.

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
/home/vscode/.local/lib/python3.10/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
decoded_chunk: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"I"},"finish_reason":null}]}

data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" am"},"finish_reason":null}]}

data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" a"},"finish_reason":null}]}

data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" large"},"finish_reason":null}]}

data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" language"},"finish_reason":null}]}

data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" model"},"finish_reason":null}]}

data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" created"},"finish_reason":null}]}

data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" by"},"finish_reason":null}]}

data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" Alibaba"},"finish_reason":null}]}

data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" Cloud"},"finish_reason":null}]}

data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"."},"finish_reason":null}]}

data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" I"},"finish_reason":null}]}

data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" am"},"finish_reason":null}]}

data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" called"},"finish_reason":null}]}

data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" Q"},"finish_reason":null}]}

data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"wen"},"finish_reason":null}]}

data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"."},"finish_reason":null}]}

data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"\",\"model_name\":\"qwen2_14b_kv_v3"},"finish_reason":null}]}

data: {"model":"qwen2_14b_kv_v3","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]
model='qwen2_14b_kv_v3' prompt_messages=[SystemPromptMessage(role=<PromptMessageRole.SYSTEM: 'system'>, content='You are a helpful AI assistant.', name=None), UserPromptMessage(role=<PromptMessageRole.USER: 'user'>, content='Who are you?', name=None)] system_fingerprint=None delta=LLMResultChunkDelta(index=1, message=AssistantPromptMessage(role=<PromptMessageRole.ASSISTANT: 'assistant'>, content='', name=None, tool_calls=[]), usage=LLMUsage(prompt_tokens=7, prompt_unit_price=Decimal('0'), prompt_price_unit=Decimal('0'), prompt_price=Decimal('0E-7'), completion_tokens=0, completion_unit_price=Decimal('0'), completion_price_unit=Decimal('0'), completion_price=Decimal('0E-7'), total_tokens=7, total_price=Decimal('0E-7'), currency='USD', latency=7.4800044149160385), finish_reason='Non-JSON encountered.')
model='qwen2_14b_kv_v3' prompt_messages=[SystemPromptMessage(role=<PromptMessageRole.SYSTEM: 'system'>, content='You are a helpful AI assistant.', name=None), UserPromptMessage(role=<PromptMessageRole.USER: 'user'>, content='Who are you?', name=None)] system_fingerprint=None delta=LLMResultChunkDelta(index=0, message=AssistantPromptMessage(role=<PromptMessageRole.ASSISTANT: 'assistant'>, content='', name=None, tool_calls=[]), usage=LLMUsage(prompt_tokens=7, prompt_unit_price=Decimal('0'), prompt_price_unit=Decimal('0'), prompt_price=Decimal('0E-7'), completion_tokens=0, completion_unit_price=Decimal('0'), completion_price_unit=Decimal('0'), completion_price=Decimal('0E-7'), total_tokens=7, total_price=Decimal('0E-7'), currency='USD', latency=7.480802866164595), finish_reason='Unknown')

zengqingfu1442 avatar May 06 '24 10:05 zengqingfu1442

It appears the issue you're encountering with server-side events (SSE) in Dify version 0.6.6, where only the first chunk's data: prefix is stripped, can be addressed by focusing on the handling of streamed responses. The core of the solution involves ensuring that each chunk received from the SSE stream is correctly processed to strip the data: prefix and that the JSON data is properly decoded.

To resolve this, you should look into the handling of SSE in the SSELineParser class and the Event class's json_data method to ensure they correctly process the streamed data. Specifically, verify that the SSELineParser correctly accumulates data lines for each event and that the Event class properly prepares the data for JSON decoding. This might involve ensuring that data split across multiple chunks or events is correctly concatenated before decoding.

Additionally, review the StreamResponse class to ensure it correctly handles each event and that the data passed to _data_process_func is correctly formatted for JSON decoding. If the JSON data is not correctly formatted or if there are issues in how the data: prefix is handled, it could lead to the decoding failures you've observed.

If modifications are needed, they should focus on enhancing the event parsing and data processing flow to handle the data: prefix correctly across all chunks and ensure the JSON data is valid before attempting to decode it.

This approach should help address the issue with SSE support in Dify, ensuring that all chunks are correctly processed and that JSON decoding succeeds for streamed responses.

Details

Sources


About Dosu This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

dosubot[bot] avatar May 06 '24 10:05 dosubot[bot]

According to the OpenAI API specifications, the results returned in a streaming fashion are not SSE, but rather a stream of one JSON object per line. Check out the documentation here: https://platform.openai.com/docs/api-reference/chat/create

takatost avatar May 07 '24 07:05 takatost

According to the OpenAI API specifications, the results returned in a streaming fashion are not SSE, but rather a stream of one JSON object per line. Check out the documentation here: https://platform.openai.com/docs/api-reference/chat/create

I replace the EventSourceResponse with StreamingResponse and the dify openai-api-compatible llm client can correctly pasrse the response.

from

from sse_starlette.sse import EventSourceResponse

to

from fastapi.responses import StreamingResponse

zengqingfu1442 avatar May 07 '24 08:05 zengqingfu1442

@zengqingfu1442 Could you submit a PR for this.

crazywoola avatar May 29 '24 11:05 crazywoola

@zengqingfu1442 Could you submit a PR for this.

i have no idea about how to fix this.

zengqingfu1442 avatar Jul 30 '24 10:07 zengqingfu1442