autogen icon indicating copy to clipboard operation
autogen copied to clipboard

Support Persistent Task Execution in Autogen Distributed Agent Runtime

Open linznin opened this issue 10 months ago • 3 comments

What feature would you like to be added?

Currently, when using Autogen's Distributed Agent Runtime, tasks are managed using asyncio's Queue. However, this approach does not persist tasks across service restarts. To ensure that tasks can continue execution even after a restart, we propose introducing an external storage mechanism such as Redis queue.

Proposed Solution

  • Replace or extend the existing asyncio.Queue implementation with an external message queue, such as Redis queue.
  • Ensure that queued tasks are not lost when the service restarts.
  • Provide configuration options to allow users to choose between in-memory and persistent queues.

Why is this needed?

  • Improved reliability and fault tolerance.
  • Ability to recover and continue executing tasks after unexpected failures or restarts.
  • Scalability for distributed agent execution.

linznin avatar Feb 03 '25 03:02 linznin

Great suggestion! I think a Redis implementation is great!

Would you like to take a look into this and have a draft Pr for it?

The redis supports persistence beyond the tasks, it can also be used to store the agent states so when the runtime restart it will have all the agents ready to go as well.

Perhaps to start we can focus on the single threaded agent runtime -- it is much more well defined.

ekzhu avatar Feb 03 '25 04:02 ekzhu

@ekzhu could you please tell me if there are any updates on this item? I think this is quite closely related to https://github.com/microsoft/autogen/discussions/4892 and I can help with this after structured message feature is complete.

abhinav-aegis avatar Apr 14 '25 09:04 abhinav-aegis

@linznin had a draft PR #5371 but I think it's still an open design question.

Happy to start a discussion on this.

ekzhu avatar Apr 14 '25 18:04 ekzhu