autogen icon indicating copy to clipboard operation
autogen copied to clipboard

Streaming ONLY the final message?[Feature Request]:

Open tyler-suard-parker opened this issue 1 year ago • 8 comments

Is your feature request related to a problem? Please describe.

Hello. I am using AutoGen as a retrieval augmented generation agent. It works fantastically, and it performs multiple searches for different topics when necessary. However, building and sending the final answer takes a long time, too long for my users. I was hoping there is a way to stream just that one final answer, as it takes the majority of the time (Like 20 seconds out of 30 seconds total). I looked at all the open issues and pull requests and I am still not sure of the status of streaming with AutoGen.

Describe the solution you'd like

In the user_proxy agent class, have a parameter called stream_final_message = True. This will allow all the agents to converse back and forth and pull whatever information is needed, but the final message is streamed so users don't have to wait for the entire formation of that message, because it tends to be long.

Additional context

No response

tyler-suard-parker avatar Jan 04 '24 18:01 tyler-suard-parker

@thinkall Do you think streaming would help here?

rickyloynd-microsoft avatar Jan 04 '24 18:01 rickyloynd-microsoft

I am not sure streaming might help here. Interaction between agents in AutoGen is sequential currently ..ie, each agent generates their response which gets sent to the next agent (written into their message history). This means all previous messages must be generated (and the associated latent), before the final response is generated. In terms of UX, what could help might be showing users the intermediate messages as they are generated towards a final answer. Happy to hear more thoughts here.

victordibia avatar Jan 05 '24 02:01 victordibia

@victordibia Thank you for your input. I understand that the interactions between agents are sequential. Our agent interaction is something like this:

  1. Agent receives question (0 seconds)
  2. Agent generates a query (1 second)
  3. Search is performed using query and results are returned (1 second)
  4. Answer to user question is generated using the query results (30 seconds)

I am hoping to stream just number 4 to my frontend, because users are not willing to wait those 30 seconds to receive an answer, and it would be great if they could at least see the first few words immediately, as would be the case with streaming.

tyler-suard-parker avatar Jan 05 '24 16:01 tyler-suard-parker

Ah ... got it. You want to stream responses (in your case, just the last message). I recall there was a PR for streaming. @ragyabraham has extensive experience in that area (he's built a tool that implements this functionality) @ragyabraham , any pointers you can share will be appreciated!

victordibia avatar Jan 05 '24 17:01 victordibia

Thank you @victordibia ! @ragyabraham I am sure this is a common use case. I want to be able to stream just the last message to my front end, as it is being created. Do you have any suggestions on how I could do that?

tyler-suard-parker avatar Jan 05 '24 19:01 tyler-suard-parker

Hey @tyler-suard-parker sure. We utilise sockets to stream messages to the FE. We instantiate a socket client and pass that as a callable in the agent config. Then we use that to emit the message to the FE. If you want more detail checkout our fork of autogen

ragyabraham avatar Jan 05 '24 19:01 ragyabraham

@ragyabraham Thank you so much for your help! I was not able to get your branch to run, I opened an issue. For my use case, I am using a frontend, an azure functions app for the backend, and openai. My main concern is the OpenAI generation time, some answers take up to 2 minutes to generate and users are complaining, so I want every word to hit my frontend as it is generated by OpenAI. Would I be able to do that using your fork?

tyler-suard-parker avatar Jan 06 '24 15:01 tyler-suard-parker

+1 looking for the same. How can i stream the final messages as a stream ( also, ideal if we can stream the intermediate messages )

lordlinus avatar Jan 08 '24 06:01 lordlinus