autogen
autogen copied to clipboard
Memory Proposal
[WIP - DRAFT] Modeling Distributed Multi-layer Memory as Agents+Events
This is a draft proposal for how we might implement a flexible, dynamic, and configurable memory system on top of autopen's event-based actor model.
Design Goals
- An easy to use memory system that works out of the box for 80% of scenarios; powerful enough to be customized for the other 20%;
- Is flexible enough to be adapted and evolved over time to take up new techniques without API/arch changes
- works at scale with thousands of agents
- allows abstracting the storage and access control to other systems
- doesn't introduce significant new dependencies or tight binding to specific storage services or solutions
- observable
- resilient
Concept - Memory vs Knowledge - this doc is focused on memory
Think of Knowledge as information that is available to the agents divorced from any runtime considerations - its data/facts/etc that are stored somewhere persistent and provide context to the LLM but is not dynamically updated with information generated by the agents during runtime (caveat you may build agents whose job it is to build knowledge banks, but that's different). Memory, for the purposes of this document - is information about events generated during program execution and persisted so that it can be recalled later as context for the LLM.
Concept: Storing Memories involves publishing an event with the memory
In keeping with AutoGen's distributed actor-model event system, in this proposal the act of storing a memory involves publishing it to a specific topic or queue. Note that this allows for all kinds of partitioning of the memory space in any design the application requires. This has advantages over an RPC driven call to a data store in that way - the partitioning can be lightweight and automatic. It is the role of a special kind of agent, a Memory Bank, to listen for and persist these memory messages. It also allows for insertion of agents for things like replay, recording, audit, etc.
Concept: Memory Banks as Agents in terms of autopen's actor model core
To help understand the proposal - there is the idea of memory banks, which are implementations of memory storage that have transforms or operations that are continuously being applied to the memories stored there. A memory bank is, just an agent, which can be written in any language autogen supports, that processes events sent to it - with a promise to store that event in some form and to make it available for retrieval later. Different memory banks may have different purposes or different transforms that are applied to the data. The conventions of which memory banks to use can be left up to the application designer, but through inheritance and composition we can provide the developer with a lot of convenience in terms of a base memory bank agent that can abstract connections to semantic stores (eg via the .NET Microsoft.Extensions.AI.Vector abstractions or Semantic Kernel, or Llama-Index) and base primitives for managing subscriptions associated with different types of agent memory. Memory Banks could be dedicated to a single agent or shared amongst multiple agents. Examples of memory banks (more later): Raw Semantic Search Memory, Episodic Memory, Short-Term Memory, Whiteboard Pattern Memory, Long-term Memory, Procedural Memory, etc.
A memory bank will listen to events published on its write topic and will respond (with retrieval) to requests published to its read (retrieval) topic(s). The response can be published to a shared topic to direct to a special topic routed to the requesting agent, depending on the desires of the application.
Concept: Retrieval Strategies
Each inference that an agent does within an application may want to call upon one or more memory banks to augment the inference with information from memory. The design of which memory banks are used in which combinations represents a retrieval strategy. Some inferences may want different retrieval strategies than others (eg tool calling may want to rely almost solely on procedural memory, while intent detection and planning may want to combine semantic search with episodic memory and whiteboard). A retrieval strategy can also be be modeled as an agent that listens on a specific topic, and when a retrieval request comes in it can assemble the set of retrieval messages to be sent to the memory banks, coordinate collecting the responses, and then respond with the consolidated retrieval to the original requesting agent(s). This allows for adapting/tuning the retrieval strategies independently of the agents themselves.
Built-in memory banks
I propose that we implement a small set of build in memory banks whose behaviors are known and matched with a basic set of built-in retrieval strategies.
Raw Semantic Search
Simply brokers memories to a semantic database via existing abstractions
Episodic Memory
Summarizes raw memories, records sequence of events, allows for multiple layers of summary in recall
Short-term memory
short, processed, focused store of what an agent is doing right now - could be persisted in an orleans grain.
whiteboard memory
A shared memory that multiple agents can leverage to keep track of shared tasks and track decisions made....
Procedural Memory
A memory bank that store knowledge of skills that an agent may leverage