dify icon indicating copy to clipboard operation
dify copied to clipboard

Dify assistant app stops generating when it takes long

Open perzeuss opened this issue 1 year ago • 8 comments

Self Checks

Dify version

0.5.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

  • Create a new "Basic Assistant"
  • Define the system prompt: Generate a long story about pandas. Respond with 10 parts, give each part a headline. Each part has at least 200 words.
  • Select GPT 4 Turbo (is a slow model currently)
    • Remove/Disable the Max Tokens
    • Load preset "Creative"
  • Send "Go!" in the debug chat
  • Wait until the generation stops
  • Send "continue"

✔️ Expected Behavior

  • The backend responses until it reaches the max tokens
  • I see the duration and the token spend below the answer
  • When I send a new message like "continue" the model continues because it gets the current chat history

❌ Actual Behavior

  • The backend responses until it reaches a failure state

  • I see 0.00s Token spent 0 below the answer grafik

  • When I send a new message like "continue" the model stars from the beginning, because it does not get the current chat history which contains the previous generated content

perzeuss avatar Jan 29 '24 16:01 perzeuss

I was not able to reproduce it on cloud, only on my self-hosted instance in Kubernetes.

perzeuss avatar Jan 29 '24 16:01 perzeuss

🤖

Hello @perzeuss! Great to see you back here. I hope you're doing well! Let's see if we can sort out this issue you've run into.

Based on the information you've provided, it seems like the issue you're experiencing might be related to the ApplicationQueueManager class in the Dify codebase.

In the listen method of this class, there's a listen_timeout set to 600 seconds (10 minutes). If the model takes longer than this to generate a response, the listen method will stop listening for new messages and publish a QueueStopEvent, effectively stopping the conversation. This could explain why the assistant stops generating responses when using a slow model like GPT 4 Turbo. You can find this in the api/core/application_queue_manager.py file.

Regarding the issue of the assistant failing to continue from the current chat history after reaching a failure state, it could be due to the publish_error method. When an error occurs, this method publishes a QueueErrorEvent and then stops listening for new messages by calling self.stop_listen(). This could prevent the assistant from continuing the conversation after an error.

To resolve these issues, you might need to adjust the listen_timeout to accommodate slower models and modify the error handling logic to allow the assistant to recover from errors and continue the conversation. However, these changes could have other implications, so they should be carefully tested.

As for the max_tokens parameter, its default value in the Dify configuration is 1024. This parameter affects the generation process by limiting the maximum number of tokens that can be generated in a single output. This means that the generated text will be cut off after 1024 tokens, regardless of whether the model has finished generating a complete thought or sentence. This is important for controlling the length and computational cost of the generation process. You can find this in the api/core/model_runtime/model_providers/__base/tokenizers/gpt2/tokenizer_config.json file.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] avatar Jan 29 '24 16:01 dosubot[bot]

@dosu-bot I enabled debugging for the api and in the console I can see that the model completes the generation after it stops in the frontend. In additional tests, I found out that the frontend cancels the stream after ~60 seconds in 5 of 5 tests.

I found this 60s timeout in the frontend, could this be the issue? https://github.com/langgenius/dify/blob/9d072920da2af99573fe854c7f4175fd87820001/web/app/components/share/text-generation/result/index.tsx#L167

perzeuss avatar Jan 29 '24 17:01 perzeuss

🤖

Based on the information you've provided, it seems like the issue might be related to the 60s timeout you found in the frontend code. This timeout could be causing the frontend to cancel the stream after ~60 seconds, which would explain why the model generation stops prematurely.

The timeout is likely there to prevent the frontend from waiting indefinitely for a response from the backend. However, in your case, it seems like this timeout might be too short, especially when using a slower model like GPT-4 Turbo.

One possible solution would be to increase this timeout to a higher value. You can do this by modifying the following line of code in the result/index.tsx file:

const timeout = setTimeout(() => {
  if (stream) {
    stream.cancel();
  }
}, 60000);

You can replace 60000 (which represents 60 seconds) with a higher value. For example, if you want to increase the timeout to 5 minutes, you would change 60000 to 300000.

Please note that this is just a potential solution and might not solve the issue completely. There could be other factors at play here, such as network conditions or other frontend/backend configurations, that are causing the stream to be cancelled prematurely.

If modifying the timeout doesn't solve the issue, please provide more information about your frontend configuration and network conditions, so we can further investigate the issue.

For more information about how the timeout works, you can refer to the React documentation on useEffect cleanup.

I hope this helps! If you have any other questions, feel free to ask.

Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] avatar Jan 29 '24 17:01 dosubot[bot]

Maybe we can try replacing our main's latest code with the tag 34634bddf12701ee31c462711e60fab3d9a71e73 for the docker images langgenius/dify-api and langgenius/dify-web.

takatost avatar Jan 30 '24 05:01 takatost

Maybe we can try replacing our main's latest code with the tag 34634bddf12701ee31c462711e60fab3d9a71e73 for the docker images langgenius/dify-api and langgenius/dify-web.

Same issue with this version. I assume it is a problem with our infrastructure. I'll do some additional tests in the next days to check which system in our infrastructure is the problem.

However, the bug that the frontend looses the message, could be fixed with a reconnect feature. The frontend would then try to reconnect when the response stream stopped without the message_end event. @takatost how do you think about that? It would also cover other scenarios like network disconnects.

perzeuss avatar Feb 01 '24 15:02 perzeuss

Maybe we can try replacing our main's latest code with the tag 34634bddf12701ee31c462711e60fab3d9a71e73 for the docker images langgenius/dify-api and langgenius/dify-web.

Same issue with this version. I assume it is a problem with our infrastructure. I'll do some additional tests in the next days to check which system in our infrastructure is the problem.

However, the bug that the frontend looses the message, could be fixed with a reconnect feature. The frontend would then try to reconnect when the response stream stopped without the message_end event. @takatost how do you think about that? It would also cover other scenarios like network disconnects.

That makes sense. Currently, we are using SSE as the streaming output solution, just like OpenAI. However, it's not the standard SSE, which means we can't achieve reconnection. To enable reconnection, we need to use standard SSE as a channel for each user or implement WebSocket. Additionally, the backend also needs to be modified to use a public queue (like Redis queue). This could be a future additional technical solution, but currently, there are no plans for it. For now, it is recommended to investigate the disconnection reasons from your own infrastructure.

takatost avatar Feb 02 '24 05:02 takatost

Hi, @perzeuss,

I'm helping the team manage their backlog and am marking this issue as stale. From what I understand, you reported an issue with the Dify assistant app stopping generation when using the GPT 4 Turbo model with a specific prompt and preset. The issue was found to occur on a self-hosted instance in Kubernetes, with the model completing generation after it stops in the frontend. There was a discussion about a potential 60s timeout in the frontend code, and suggestions were made to increase the timeout value and investigate disconnection reasons from the infrastructure. The issue was resolved by increasing the timeout value in the frontend code and implementing a reconnect feature to address the problem of the frontend losing the message.

Could you please confirm if this issue is still relevant to the latest version of the repository? If it is, please let the team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you for your understanding and cooperation. If you have any further questions or concerns, feel free to reach out.

dosubot[bot] avatar Mar 29 '24 16:03 dosubot[bot]