Support Persistent Task Execution in Autogen Distributed Agent Runtime
What feature would you like to be added?
Currently, when using Autogen's Distributed Agent Runtime, tasks are managed using asyncio's Queue. However, this approach does not persist tasks across service restarts. To ensure that tasks can continue execution even after a restart, we propose introducing an external storage mechanism such as Redis queue.
Proposed Solution
- Replace or extend the existing
asyncio.Queueimplementation with an external message queue, such as Redis queue. - Ensure that queued tasks are not lost when the service restarts.
- Provide configuration options to allow users to choose between in-memory and persistent queues.
Why is this needed?
- Improved reliability and fault tolerance.
- Ability to recover and continue executing tasks after unexpected failures or restarts.
- Scalability for distributed agent execution.
Great suggestion! I think a Redis implementation is great!
Would you like to take a look into this and have a draft Pr for it?
The redis supports persistence beyond the tasks, it can also be used to store the agent states so when the runtime restart it will have all the agents ready to go as well.
Perhaps to start we can focus on the single threaded agent runtime -- it is much more well defined.
@ekzhu could you please tell me if there are any updates on this item? I think this is quite closely related to https://github.com/microsoft/autogen/discussions/4892 and I can help with this after structured message feature is complete.
@linznin had a draft PR #5371 but I think it's still an open design question.
Happy to start a discussion on this.