[Draft, Feedback Needed] Memory in AgentChat
Memory for AgentChat Agents
It would be useful to have some notion of memory, and the ability to attach memory to an agent.
Right now the AssitantAgent can take on tools.
agent = Agent(model=model, tools=[] )
Some use cases often benefit from being able to retrieve memory just in time, add to the prompt before responding (RAG etc).
agent = Agent(model=model, tools=[], memory=[])
This PR implements
- Memory Protocol
- ListMemory - simple memory based on a list and basic similarity matching.
- ChromaDBMemory - implemented using ChromaDB with similar expected behaviour for other vectordb offerins such as pinecone, scann, faiss, mongodb etc. (this impl is added more as an example and might be removed and added somewhere else e.g., in autogen_ext or some 3rd party repo)
Memory Behaviour
Memory protocol that devs can overload.
@runtime_checkable
class Memory(Protocol):
"""Protocol defining the interface for memory implementations."""
@property
def name(self) -> str | None:
"""The name of this memory implementation."""
...
@property
def config(self) -> BaseMemoryConfig:
"""The configuration for this memory implementation."""
...
async def query(
self,
query: Union[str, Image, List[Union[str, Image]]],
cancellation_token: CancellationToken | None = None,
**kwargs: Any
) -> List[MemoryQueryResult]:
"""
Query the memory store and return relevant entries.
Args:
query: Text, image or multimodal query
cancellation_token: Optional token to cancel operation
**kwargs: Additional implementation-specific parameters
Returns:
List of memory entries with relevance scores
"""
...
async def add(
self,
entry: MemoryEntry,
cancellation_token: CancellationToken | None = None
) -> None:
"""
Add a new entry to memory.
Args:
entry: The memory entry to add
cancellation_token: Optional token to cancel operation
"""
...
async def clear(self) -> None:
"""Clear all entries from memory."""
...
async def cleanup(self) -> None:
"""Clean up any resources used by the memory implementation."""
...
Integrating with AssistantAgent
Perhaps a big change with this PR is how AssistantAgent is extended to use memory.
- AssistantAgent will try to query memory using message[-1] in
on_messages_stream(if TextMessage, or MultiModalMessage), returned result is appended tomodel_context - The implementation
AssistantAgentimpl above focuses onmemory.queryand adds that JIT to the agent context. It does not concern itself much with how stuff is added to memory - reason being that his can be heavily usecase driven. It is expected that the developer will runmemory.addoutside of agent logic . - Developers can implement their own custom memory classes by implementing the
Memoryprotocol.
Example Implementation
Example notebook highlighting these.
from autogen_agentchat.memory._base_memory import MemoryEntry
from autogen_agentchat.memory._chroma_memory import ChromaMemory, ChromaMemoryConfig
# Initialize memory
chroma_memory = ChromaMemory(
name="travel_memory",
config=ChromaMemoryConfig(
collection_name="travel_facts",
# Configure number of results to return instead of similarity threshold
k=1
)
)
# Add some travel-related memories
await chroma_memory.add(MemoryEntry(
content="Paris is known for the Eiffel Tower and amazing cuisine.",
source="travel_guide"
))
await chroma_memory.add(MemoryEntry(
content="The most important thing about tokyo is that it has the world's busiest railway station - Shinjuku Station.",
source="travel_facts"
))
# Create agent with memory
agent = AssistantAgent(
name="travel_agent",
model_client=OpenAIChatCompletionClient(
model="gpt-4o",
# api_key="your_api_key"
),
memory=chroma_memory,
system_message="You are a travel expert"
)
agent_team = RoundRobinGroupChat([agent], termination_condition = MaxMessageTermination(max_messages=2))
stream = agent_team.run_stream(task="Tell me the most important thing about Tokyo.")
await Console(stream);
---------- user ----------
Tell me the most important thing about Tokyo.
---------- travel_agent ----------
One of the most important aspects of Tokyo is that it has the world's busiest railway station, Shinjuku Station. This station serves as a major hub for transportation, with millions of commuters and travelers passing through its complex network of train lines each day. It highlights Tokyo's status as a bustling metropolis with an advanced public transportation system.
[Prompt tokens: 72, Completion tokens: 66]
---------- Summary ----------
Number of messages: 2
Finish reason: Maximum number of messages 2 reached, current message count: 2
Total prompt tokens: 72
Total completion tokens: 66
Duration: 1.47 seconds
Related issue number
Closes #4039, #4648
TBD
- [ ] Finalize design
- [ ] Add tests
Checks
- [ ] I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
- [ ] I've added tests (if relevant) corresponding to the changes introduced in this PR.
- [ ] I've made sure all auto checks have passed.
Open Questions
- Should memory be a list or simple?
memory = [chroma_memory]ormemory = chroma_memory. E.g, should an agent had the opportunity to "dip" into several memory banks?
I agree with @husseinmozannar that this is a reasonable and very clean implementation of this idea -- if perhaps a little restrictive. I like the idea of passing the entire context (or perhaps even state!) to the query engine. It's also worth thinking if we can somehow parameterize how the memory is added to the context at the time of the inference. E.g., this implementation adds memory right after the system prompt, and without any explanation or preamble. Other implementations are also reasonable. For example you could introduce memory with something like: "As you work through the user's request, the following snippets may, or may not, be helpful:" You could decide to include memory as the second-to-last message, or the last message (rather than the second). In AutoGen 0.2, we had the idea of context transformers. I wonder if something similar could work here.
I second @afourney and @husseinmozannar's suggestion. I think the query method forces the caller of the memory (e.g., AssistantAgent) make an upfront choice on how memory is used to added to the context.
How about let the memory protocol provide a transform method that takes a model context (i.e., a list of LLMMessage, tool calls, etc.) and returns a transformed model context that can be sent to the model client directly. This way the caller of the memory module doesn't need to make an opinionated decision on how to query and how to use the result, rather, we can leave this decision to the memory module itself, and caller of AssistantAgent can choose from a preset or customize this from application.
There is a ModelContext module in the Core API that is barely used, perhaps we can refine that one and make it work side-by-side with the memory protocol
Chatted with @victordibia API is nice and clean and I agree with its usefulness.
It would be useful to have following somewhere in the repo but not in the base protocol
- example of memory related events being raised for observability
- example of agent selectively calling
.popon the memory - example of agent selectively calling
.addon the memory -- replicates memory feature in ChatGPT UI. - a full fledged RAG agent implemented using this protocol. I would like to be able to add AutoGen repo to it and ask questions.
Chatted with @victordibia API is nice and clean and I agree with its usefulness.
It would be useful to have following somewhere in the repo but not in the base protocol
- example of memory related events being raised for observability
- example of agent selectively calling
.popon the memory- example of agent selectively calling
.addon the memory -- replicates memory feature in ChatGPT UI.- a full fledged RAG agent implemented using this protocol. I would like to be able to add AutoGen repo to it and ask questions.
Since
Chatted with @victordibia API is nice and clean and I agree with its usefulness.
It would be useful to have following somewhere in the repo but not in the base protocol
- example of memory related events being raised for observability
- example of agent selectively calling
.popon the memory- example of agent selectively calling
.addon the memory -- replicates memory feature in ChatGPT UI.- a full fledged RAG agent implemented using this protocol. I would like to be able to add AutoGen repo to it and ask questions.
Adding my 2 cents here. I think would be interesting to have a lower-level abstraction for storage and information types which MimeType and MemoryContent are derived from. There may be some differences between knowledge-base retrieval vs memory retrieval that may be useful to consider when creating these abstractions.
I think it could be useful to think how memory uses storage and have chroma as a storage implementation that some VectorEmbeddingMemory uses and then users can easily swap whatever vector database they want to use. Then the storage abstraction can possibly be adapted to some knowledge-base retrievers we decide to implement. I think most other agentic frameworks such as semantic kernel and langchain also have abstractions at the storage layer and it may be easier for us to create adapters in this way.
@lspinheiro ,
The AgentChat framework will likely only have the Memory protocol, developers should overload it to implement whatever vector, graph or any other type of Just in time memory they need for their agent.
I think would be interesting to have a lower-level abstraction for storage and information types which MimeType and MemoryContent are derived from
Good idea, can you propose some concrete examples?
I think it could be useful to think how memory uses storage and have chroma as a storage implementation that some VectorEmbeddingMemory uses and then users can easily swap whatever vector database they want to use.
I think I understand your comment here ie. that VectorEmbeddingMemory is a general enough case that we should explore some standardized implemetation that enables easily switching out various standard dbs. One thing to note is that the apis for this DBs are so different that there will still be quite a bit of code written specifically for each. That being said, perhaps we can get the base Memory protocol done in this PR and then open an new issue for designing something for VectorEmbeddingMemory
Agree with @victordibia here, let's focus on the memory protocol first before worrying about the implementation level stuff.
Furthermore, I would argue that we should be careful not to introduce too many abstractions.
open an new issue for designing something for VectorEmbeddingMemory
We should take a look at Semantic Kernel's vector memory abstraction and consider adopt that or duck type it.
Codecov Report
Attention: Patch coverage is 89.79592% with 10 lines in your changes missing coverage. Please review.
Project coverage is 69.08%. Comparing base (
d883e3d) to head (cb3b051). Report is 1 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #4438 +/- ##
==========================================
+ Coverage 68.86% 69.08% +0.21%
==========================================
Files 156 159 +3
Lines 10252 10350 +98
==========================================
+ Hits 7060 7150 +90
- Misses 3192 3200 +8
| Flag | Coverage Δ | |
|---|---|---|
| unittests | 69.08% <89.79%> (+0.21%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I believe the memory interface should be in Core. It can be used independent from agent chat agents.
@jackgerrits , thoughts on moving this to core? i agree with Eric that this could be general enough to be used across the framework.
If so, any suggestions on location in core that might fit?
@jackgerrits , thoughts on moving this to core? i agree with Eric that this could be general enough to be used across the framework.
If so, any suggestions on location in core that might fit?
Yeah I think core makes sense. Perhaps autogen_core.memory? We've tended to go flat in core now, but perhaps it makes sense to bundle these? Also open to them just being flat/in the root namespace
@jackgerrits , thoughts on moving this to core? i agree with Eric that this could be general enough to be used across the framework. If so, any suggestions on location in core that might fit?
Yeah I think core makes sense. Perhaps
autogen_core.memory? We've tended to go flat in core now, but perhaps it makes sense to bundle these? Also open to them just being flat/in the root namespace
I think autogen_core.memory. For AI-related components, let's use submodules.