autogen Memory Interface for AgentChat agents

What feature would you like to be added?

Memory for AgentChat Agents

It would be useful to have some notion of memory, and the ability to attach memory to an agent. Right now the AssitantAgent can take on tools.

agent = Agent(model=model, tools=[] )

Some use cases often benefit from being able to retrieve memory just in time, add to the prompt before responding (RAG etc).

agent = Agent(model=model, tools=[], memory=[])

Memory Behaviour

A default behaviour might be that

on_messages(task) .. task is used as a query to memory.query() if memory is provided to the agent
the response is added to the prompt before the LLM responds

A rough sketch of a memory protocol.

@runtime_checkable
class Memory(Protocol):
    """Protocol defining the interface for memory implementations."""

    @property
    def name(self) -> str | None:
        """The name of this memory implementation."""
        ...

    @property
    def config(self) -> BaseMemoryConfig:
        """The configuration for this memory implementation."""
        ...

    async def query(
        self,
        query: Union[str, Image, List[Union[str, Image]]],
        cancellation_token: CancellationToken | None = None,
        **kwargs: Any
    ) -> List[MemoryQueryResult]:
        """
        Query the memory store and return relevant entries.

        Args:
            query: Text, image or multimodal query
            cancellation_token: Optional token to cancel operation
            **kwargs: Additional implementation-specific parameters

        Returns:
            List of memory entries with relevance scores
        """
        ...

    async def add(
        self,
        entry: MemoryEntry,
        cancellation_token: CancellationToken | None = None
    ) -> None:
        """
        Add a new entry to memory.

        Args:
            entry: The memory entry to add
            cancellation_token: Optional token to cancel operation
        """
        ...

    async def clear(self) -> None:
        """Clear all entries from memory."""
        ...

    async def cleanup(self) -> None:
        """Clean up any resources used by the memory implementation."""
        ...

AssistantAgent will try to query memory using last message in on_messages (if TextMessage, or MultiModalMessage), returned result is appended to content
Developers can implement their own custom memory classes by implementing the Memory protocol.
The implementation AssistantAgent impl above focuses on memory.query and adds that JIT to the context. It does not concern itself much with how stuff is added to memory - reason being that his can be heavily usecase driven and expects the developer to add to memory outside of agent logic.

Example Implementation

I have a branch that implements

Example notebook highlighting these.

from autogen_agentchat.memory._base_memory import MemoryEntry
from autogen_agentchat.memory._chroma_memory import ChromaMemory, ChromaMemoryConfig

 
# Initialize memory
chroma_memory = ChromaMemory(
    name="travel_memory",
    config=ChromaMemoryConfig(
        collection_name="travel_facts",
        # Configure number of results to return instead of similarity threshold
        k=1  
    )
)
# Add some travel-related memories
await chroma_memory.add(MemoryEntry(
    content="Paris is known for the Eiffel Tower and amazing cuisine.",
    source="travel_guide"
))

await chroma_memory.add(MemoryEntry(
    content="The most important thing about tokyo is that it has the world's busiest railway station - Shinjuku Station.",
    source="travel_facts"
))

# Create agent with memory
agent = AssistantAgent(
    name="travel_agent",
    model_client=OpenAIChatCompletionClient(
        model="gpt-4o",
        # api_key="your_api_key"
    ),
    memory=chroma_memory,
    system_message="You are a travel expert"
)

agent_team = RoundRobinGroupChat([agent], termination_condition = MaxMessageTermination(max_messages=2))
stream = agent_team.run_stream(task="Tell me the most important thing about Tokyo.")
await Console(stream);

---------- user ----------
Tell me the most important thing about Tokyo.
---------- travel_agent ----------
One of the most important aspects of Tokyo is that it has the world's busiest railway station, Shinjuku Station. This station serves as a major hub for transportation, with millions of commuters and travelers passing through its complex network of train lines each day. It highlights Tokyo's status as a bustling metropolis with an advanced public transportation system.
[Prompt tokens: 72, Completion tokens: 66]
---------- Summary ----------
Number of messages: 2
Finish reason: Maximum number of messages 2 reached, current message count: 2
Total prompt tokens: 72
Total completion tokens: 66
Duration: 1.47 seconds

@ekzhu

Potentially relevant to support might be langchain memory. There a basic memory is a list of messages. https://python.langchain.com/v0.1/docs/modules/memory/

How is this related to load/save state

One way to think about it is that loading state has to do with what happens before/after an agent is run. Memory is more dynamic and is more about injecting JIT context given the exact input the agent receives during run.

Why is this needed?

Provides an interface or supporting RAG/Memory .

Nov 01 '24 20:11 victordibia

this is a very great component to design.

Nov 18 '24 19:11 colombod

@colombod , @rickyloynd-microsoft , @ekzhu . I have updated the issue above with some sample implementation on (below). I'd love to get some feedback on if there is appetite for this in AgentChat, or general feedback before any additional progress is made. Draft PR here - https://github.com/microsoft/autogen/pull/4438

Dec 01 '24 00:12 victordibia