How to fix Gemini stream mode?
@CXwudi 在config_kwargs中设置 ··· config_kwargs.pop("stream", None) ···· Although it can solve the error reporting problem, it is still not stream output. The code _handle_generate_stream_response seems to have a problem.
Originally posted by @AAEE86 in https://github.com/langgenius/dify/issues/8678#issuecomment-2391088192
To address the issue with the Gemini stream mode not resulting in stream output, you should focus on the _handle_chat_generate_stream_response method. This method is responsible for processing the streaming response. Here are a few things to check and potentially fix:
-
Ensure the Response is Properly Iterated: The method iterates over the
responseobject, which should be anIterator. Make sure that theresponseobject passed to this method is indeed an iterator and is yielding data as expected. -
Check the Data Parsing: The method attempts to parse each line of the response as JSON. If the response lines are not in the expected format, this could cause issues. Ensure that the response lines are correctly formatted JSON strings.
-
Handle the End of Stream: The method checks for a line that equals
"[DONE]"to determine the end of the stream. Ensure that this condition is correctly handled and that the stream is properly closed or returned when this line is encountered. -
Yielding the Stream Data: The method yields
LLMResultChunkobjects for each piece of data. Ensure that thechoice["delta"]["content"]is not empty before yielding, as the current code skips empty content.
Here is a snippet of the relevant code for reference:
def _handle_chat_generate_stream_response(
self,
model: str,
prompt_messages: list[PromptMessage],
credentials: dict,
response: Iterator,
) -> Generator:
for line in response:
if not line:
continue
line = line.decode("utf-8")
# remove the first `data: ` prefix
if line.startswith("data:"):
line = line[5:].strip()
try:
data = json.loads(line)
except Exception as e:
if line.strip() == "[DONE]":
return
choices = data.get("choices", [])
stop_reason = ""
for choice in choices:
if choice.get("finish_reason"):
stop_reason = choice["finish_reason"]
if len(choice["delta"]["content"]) == 0:
continue
yield LLMResultChunk(
model=model,
prompt_messages=prompt_messages,
delta=LLMResultChunkDelta(
index=0,
message=AssistantPromptMessage(content=choice["delta"]["content"], tool_calls=[]),
finish_reason=stop_reason,
),
)
Ensure that the response from the model is correctly formatted and that the streaming logic is correctly implemented in the _handle_chat_generate_stream_response method [1].
To continue talking to Dosu, mention @dosu.
We need to know how the
stream parameter was initially passed in. I can override this stream in the function to fix the stream=True case, but when stream is False, because the external stream defaults to True, the non-streaming return result won't be displayed.
When I changed the return value of
_handle_generate_response to also be in stream format, the stream=False issue was also fixed. This indicates that the handling of the return value has a conditional branch based on the stream value. However, I'm currently unsure how to pass the original stream value to the modified function.
def _handle_invoke_result(
self, invoke_result: LLMResult | Generator
) -> Generator[RunEvent | ModelInvokeCompleted, None, None]:
"""
Handle invoke result
:param invoke_result: invoke result
:return:
"""
if isinstance(invoke_result, LLMResult):
return
Hmm, okay, I found the place to handle it. They didn't consider the stream=false case at all... Even though they require us to have non-streaming handling when adding LLMs, they actually completely ignore the non-streaming return.
I don't quite understand these. Can you fix this issue? @Hisir0909
Hi @Hisir0909, I am just adding my one cent here, _handle_generate_response has nothing to do when stream=False. Based on
https://github.com/langgenius/dify/blob/7121afdd4426648b99055b7041b5e99bc7b1ad3a/api/core/model_runtime/model_providers/google/llm/llm.py#L217-L220
_handle_generate_stream_response is the method handling stream output. However it correctly returns a generator type which really confusing me
Hi @Hisir0909, I am just adding my one cent here,
_handle_generate_responsehas nothing to do whenstream=False. Based onhttps://github.com/langgenius/dify/blob/7121afdd4426648b99055b7041b5e99bc7b1ad3a/api/core/model_runtime/model_providers/google/llm/llm.py#L217-L220
_handle_generate_stream_responseis the method handling stream output. However it correctly returns a generator type which really confusing me
What do you mean? When stream=False, is _handle_generate_stream_response still being processed here? My current solution is to use a local variable to override the function's stream parameter here, because the LLM node always calls it with stream=True, while the actual stream setting in the YAML configuration is in model_parameters.
I see, my bad. I misunderstood your previous statement
@AAEE86 @CXwudi Please take a look at my submission. Can it resolve the issue, and are my modifications reasonable? 🐸