dify How to fix Gemini stream mode？

@CXwudi 在config_kwargs中设置 ··· config_kwargs.pop("stream", None) ···· Although it can solve the error reporting problem, it is still not stream output. The code _handle_generate_stream_response seems to have a problem.

Originally posted by @AAEE86 in https://github.com/langgenius/dify/issues/8678#issuecomment-2391088192

Oct 03 '24 11:10 AAEE86

To address the issue with the Gemini stream mode not resulting in stream output, you should focus on the _handle_chat_generate_stream_response method. This method is responsible for processing the streaming response. Here are a few things to check and potentially fix:

Ensure the Response is Properly Iterated: The method iterates over the response object, which should be an Iterator. Make sure that the response object passed to this method is indeed an iterator and is yielding data as expected.
Check the Data Parsing: The method attempts to parse each line of the response as JSON. If the response lines are not in the expected format, this could cause issues. Ensure that the response lines are correctly formatted JSON strings.
Handle the End of Stream: The method checks for a line that equals "[DONE]" to determine the end of the stream. Ensure that this condition is correctly handled and that the stream is properly closed or returned when this line is encountered.
Yielding the Stream Data: The method yields LLMResultChunk objects for each piece of data. Ensure that the choice["delta"]["content"] is not empty before yielding, as the current code skips empty content.

Here is a snippet of the relevant code for reference:

def _handle_chat_generate_stream_response(
        self,
        model: str,
        prompt_messages: list[PromptMessage],
        credentials: dict,
        response: Iterator,
    ) -> Generator:
        for line in response:
            if not line:
                continue
            line = line.decode("utf-8")
            # remove the first `data: ` prefix
            if line.startswith("data:"):
                line = line[5:].strip()
            try:
                data = json.loads(line)
            except Exception as e:
                if line.strip() == "[DONE]":
                    return
            choices = data.get("choices", [])

            stop_reason = ""
            for choice in choices:
                if choice.get("finish_reason"):
                    stop_reason = choice["finish_reason"]

                if len(choice["delta"]["content"]) == 0:
                    continue
                yield LLMResultChunk(
                    model=model,
                    prompt_messages=prompt_messages,
                    delta=LLMResultChunkDelta(
                        index=0,
                        message=AssistantPromptMessage(content=choice["delta"]["content"], tool_calls=[]),
                        finish_reason=stop_reason,
                    ),
                )

Ensure that the response from the model is correctly formatted and that the streaming logic is correctly implemented in the _handle_chat_generate_stream_response method ^[1].

To continue talking to Dosu, mention @dosu.

Oct 03 '24 11:10 dosubot[bot]

We need to know how the stream parameter was initially passed in. I can override this stream in the function to fix the stream=True case, but when stream is False, because the external stream defaults to True, the non-streaming return result won't be displayed.

Oct 08 '24 06:10 Hisir0909

When I changed the return value of _handle_generate_response to also be in stream format, the stream=False issue was also fixed. This indicates that the handling of the return value has a conditional branch based on the stream value. However, I'm currently unsure how to pass the original stream value to the modified function.

Oct 08 '24 06:10 Hisir0909

    def _handle_invoke_result(
        self, invoke_result: LLMResult | Generator
    ) -> Generator[RunEvent | ModelInvokeCompleted, None, None]:
        """
        Handle invoke result
        :param invoke_result: invoke result
        :return:
        """
        if isinstance(invoke_result, LLMResult):
            return

Hmm, okay, I found the place to handle it. They didn't consider the stream=false case at all... Even though they require us to have non-streaming handling when adding LLMs, they actually completely ignore the non-streaming return.

Oct 08 '24 07:10 Hisir0909

I don't quite understand these. Can you fix this issue? @Hisir0909

Oct 08 '24 07:10 AAEE86

Hi @Hisir0909, I am just adding my one cent here, _handle_generate_response has nothing to do when stream=False. Based on https://github.com/langgenius/dify/blob/7121afdd4426648b99055b7041b5e99bc7b1ad3a/api/core/model_runtime/model_providers/google/llm/llm.py#L217-L220

_handle_generate_stream_response is the method handling stream output. However it correctly returns a generator type which really confusing me

Oct 08 '24 13:10 CXwudi

Hi @Hisir0909, I am just adding my one cent here, _handle_generate_response has nothing to do when stream=False. Based on

https://github.com/langgenius/dify/blob/7121afdd4426648b99055b7041b5e99bc7b1ad3a/api/core/model_runtime/model_providers/google/llm/llm.py#L217-L220

_handle_generate_stream_response is the method handling stream output. However it correctly returns a generator type which really confusing me

What do you mean? When stream=False, is _handle_generate_stream_response still being processed here? My current solution is to use a local variable to override the function's stream parameter here, because the LLM node always calls it with stream=True, while the actual stream setting in the YAML configuration is in model_parameters.

Oct 09 '24 01:10 Hisir0909

I see, my bad. I misunderstood your previous statement

Oct 09 '24 01:10 CXwudi

@AAEE86 @CXwudi Please take a look at my submission. Can it resolve the issue, and are my modifications reasonable? 🐸

Oct 09 '24 01:10 Hisir0909