autogen icon indicating copy to clipboard operation
autogen copied to clipboard

Refactorization of ConversableAgent to unify async and sync code and better extensibility

Open davorrunje opened this issue 1 year ago • 17 comments

Note: this is a refactorization study and will change a lot as we explore different design patterns and how they fit together in the framework. Feedback and suggestions are more than welcome.

Why are these changes needed?

The current state.

Several issues and PRs involve how to extend the ConversableAgent class. Effort like Teachability for every agent (#534) and the concept of modularized agent capability is a big step toward solving this from a higher level. There are still low-level extension issues such as:

  • Logging #1146
  • Streaming message to frontend #394 #1290 #1313
  • Several bugs caused by async and sync code not "synced" in code changes like #1242 and issues related to unexpected behaviors of using async public methods (#1012)

While some patches have been made to address those issues but not the root cause, which is that the ConversableAgent was mostly designed to work in a console environment but not yet a server-side library, and the code duplication in async and sync public methods.

What is this PR for.

The main goal of this PR is to address the above and at the same time introducing a design pattern that would make it much easier to add low-level functionalities like logging, content filtering, RAG-style context expansion, and custom termination mechanism and many others. We also want to demonstrate that high-level capabilities like Teachability can be composed of re-usable low-level components.

What is this PR NOT for:

This PR is not for breaking the existing ConversableAgent methods. The ConversableAgent has many great methods like the register_for_llm and initiate_chat that are loved by users.

What can you do to help:

This is not a small change, so we want to have as many feedback AND HELP as possible. We will post some work items on this page as we move forward, and you are welcome to contribute!

Update 01/16/2024

@ekzhu and I came up with the new scheme which introduces a single new function/decorator hookable and implements Middleware pattern. An example of how to use it is as follows:

class A:
    def __init__(self, name: str) -> None:
        self.name = name

    @hookable
    def go(self, *args: Any, **kwargs: Any) -> str:
        return f"{self.name}.{format_function(self.go, *args, **kwargs)}"

class MyMiddleware:
    def __init__(self, name: str) -> None:
        self.name = name

    def call(self, *args: Any, next: Callable[..., Any], **kwargs: Any) -> str:
        retval = next(*args, **kwargs)
        return f"{self.name}.{format_function(self.call, retval)}"

    def trigger(self, *args: Any, **kwargs: Any) -> bool:
        return not ("skip_middleware" in kwargs and kwargs["skip_middleware"])

a = A("a")
add_middleware(A.go, MyMiddleware("mw"))

assert a.go(1, 2, 3, a=4, b=5) == "mw.call(a.go(1, 2, 3, a=4, b=5))"
assert a.go(1, 2, 3, a=4, b=5, skip_middleware=False) == "mw.call(a.go(1, 2, 3, a=4, b=5, skip_middleware=False))"
assert a.go(1, 2, 3, a=4, b=5, skip_middleware=True) == "a.go(1, 2, 3, a=4, b=5, skip_middleware=True)"

add_middleware(A.go, MyMiddleware("MW"))
assert a.go(1, 2, 3, a=4, b=5) == "mw.call(MW.call(a.go(1, 2, 3, a=4, b=5)))"

There can be more than one hookable method in each class. We can use this to implement reply and hook functions and probably many other things.

Update 01/17/2024

@tyler-suard-parker @joshkyh @bitnom @jackgerrits @rickyloynd-microsoft

You are welcome to try out this branch. We are currently working on replacing register_hook for now then we will move on to refactor existing generate_***_reply functions in the ConversableAgent class into a middleware and upgrade the generate_reply method to use the middleware -- so the current functionalities stay the same.

But we need someone to think about how to implement some of these new features using middleware, and add them to the generate_reply to enable new functionalities.

  1. Incoming and outgoing message streaming to web socket
  2. Incoming and outgoing message logging
  3. RAG-style message context expansion, e.g., retrieve relevant context from a vector database and expand the incoming message's content.
  4. Human middleware to short-circuit the rest of the middleware pipeline -- think a static file middleware in a web framework.
  5. Out-going message filtering. e.g., filtering api keys, passwords, etc.

Here is a simple example of logging middleware that logs incoming and outgoing messages.

class LoggingMiddleware:
  def call(self, message: Dict, next: Callable[[Dict], Dict]) -> Dict:
    logging.info(f"Incoming: {retval}")
    retval = next(message)
    logging.info(f"Outgoing: {retval}")
    return retval

Here is another for filtering out OpenAI API keys.

class FilterAPIKeyMiddleware:
  def call(self, message: Dict, next: Callable[[Dict], Dict]) -> Dict:
    retval = next(message)
    if retval.get("content", False):
      re.sub(r'(sk-\w{4})\w+', r'\1***', retval["content"])
    return retval

Another one for simple RAG-style context expansion:

class RAGMiddleware:
  def call(self, message: Dict, next: Callable[[Dict], Dict]) -> Dict:
    if message.get("content", False):
      # Expand the message content with some text retrieved from vector db.
      expansion = vectordb.search(message["content"], k=1)[0]
      message["content"] += expansion
    retval = next(message)
    return retval

We are also adding a decorator that would convert a function into a middleware, saving user the effort to write a class.

More updates 01/17/2024

Teachability is refactored using the Middleware pattern instead of hooks. This is the actual implementation right now:

def add_to_agent(self, agent: ConversableAgent):
    """Adds teachability to the given agent."""
    self.teachable_agent = agent

    # Register a middleware for processing the last message.
    class ProcessLastMessageMiddleware:
        def __init__(self, *, agent: ConversableAgent, teachability: Teachability):
            self.teachability = teachability
            self.agent = agent

        def call(self, agent: ConversableAgent, user_text: str, *, next: Callable[[str], str]):
            user_text = next(agent, user_text)
            return self.teachability.process_last_message(user_text)

        def trigger(self, agent: ConversableAgent, user_text: str):
            return self.agent == agent

    add_middleware(
        ConversableAgent.process_last_message_user_text,
        ProcessLastMessageMiddleware(agent=agent, teachability=self),
    )

Whenever ConversableAgent.process_last_message_user_text is called, the ProcessLastMessageMiddleware.call is invoked and a wrapper to the original ConversableAgent.process_last_message_user_text is passed as the next parameter. All typing hints here are optional, they are present only to help understand what the expected parameters are.

There is some cleanup and error handling remaining, but this is basically it. As Erik mentioned above, it is easy to write a set of standard MIddleware that covers the most common use cases.

Related issue number

Checks

  • [x] I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
  • [x] I've added tests (if relevant) corresponding to the changes introduced in this PR.
  • [x] I've made sure all auto checks have passed.

davorrunje avatar Jan 14 '24 02:01 davorrunje

Codecov Report

Attention: 209 lines in your changes are missing coverage. Please review.

Comparison is base (1ab2354) 32.48% compared to head (bbdc8dd) 49.48%.

Files Patch % Lines
...gen/agentchat/contrib/capabilities/teachability.py 5.08% 56 Missing :warning:
autogen/agentchat/conversable_agent.py 71.79% 45 Missing and 10 partials :warning:
autogen/agentchat/middleware/termination.py 71.65% 22 Missing and 14 partials :warning:
autogen/agentchat/middleware/llm.py 79.72% 24 Missing and 5 partials :warning:
autogen/agentchat/middleware/message_store.py 79.64% 15 Missing and 8 partials :warning:
autogen/agentchat/contrib/gpt_assistant_agent.py 0.00% 2 Missing :warning:
autogen/agentchat/contrib/compressible_agent.py 50.00% 1 Missing :warning:
autogen/agentchat/contrib/llava_agent.py 50.00% 1 Missing :warning:
autogen/agentchat/contrib/math_user_proxy_agent.py 75.00% 1 Missing :warning:
...ogen/agentchat/contrib/retrieve_assistant_agent.py 0.00% 1 Missing :warning:
... and 4 more
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1240       +/-   ##
===========================================
+ Coverage   32.48%   49.48%   +17.00%     
===========================================
  Files          41       49        +8     
  Lines        4907     5252      +345     
  Branches     1120     1238      +118     
===========================================
+ Hits         1594     2599     +1005     
+ Misses       3187     2485      -702     
- Partials      126      168       +42     
Flag Coverage Δ
unittests 49.42% <78.25%> (+16.98%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Jan 14 '24 02:01 codecov-commenter

This starts to resemble more and more like middleware pattern: https://learn.microsoft.com/en-us/aspnet/core/fundamentals/middleware/?view=aspnetcore-8.0

See the diagram above. Each request gets processed by a chained pipeline of middlewares. Each middleware does some work based on the message and current application state, and decides whether to pass control to the next one or "short-circuit" back to the previous one.

We can also use this design pattern to model our Conversable Agent, and I think it accurate describes how our agent currently behaves.

A simple LLM-based agent contains the following middlewares in order:

  1. Handles terminations (counter-based termination, message-based termination, and others) -- this is like a short-circuit middleware which returns before anything else is made.
  2. Message manipulation, i.e., for OPENAI message, we need to change the "role" key to "user" to fake a user message.
  3. Message augmentation. i.e., augment relevant context for each message by doing RAG-style retrieval. Teachability and memory can be considered this way.
  4. Bookkeeping, storing and logging incoming and outgoing messages.
  5. Each generate_reply method is a middleware as each of them gets triggered and decides whether to pass control to the next one. If it decides to short-circuit, the rest of the reply methods will be skipped.

ekzhu avatar Jan 14 '24 03:01 ekzhu

@ekzhu exactly! Hooks seemed like the easiest thing to try it out because it used in Teachability only, but we should apply the same principle to everything else as you described above.

davorrunje avatar Jan 14 '24 05:01 davorrunje

Maybe we can use this branch to try this design on teachability and perhaps a few other things, like loggings and content filtering. This would supersede #1215 and #1146 (@cheng-tan) so let's take the time to do it right.

ekzhu avatar Jan 14 '24 05:01 ekzhu

@ekzhu exactly, this should be a part of making the API consistent across the board. Hence the draft status, otherwise this is good to go.

davorrunje avatar Jan 14 '24 06:01 davorrunje

This might be the best pr of the repo rn. It's a better direction than the simple observers I've been using. It will be exciting to hook all methods.

Expanding on this draft would greatly benefit from a comprehensive refactor and utilization of pydantic to its full potential. I'll push whatever I can think of for that in my next work-session, and probably have a few methods worth adding to Hookable. There's a lot of potential there.

bitnom avatar Jan 14 '24 20:01 bitnom

I've started down this path now. Here are some preliminary references in-case anyone is interested in existing implementations:

https://pypi.org/project/middleware/

https://pypi.org/project/observer-hooks/

https://pypi.org/project/func-hooks/

https://pypi.org/project/better-hooks no repo so I pushed to: https://github.com/bitnom/better-hooks

then I got the AI involved:

  • Wrapt: A module for decorators, wrappers, and monkey patching, offering control over the execution process and suitable for implementing pre-hooks and post-hooks.

    Website: https://wrapt.readthedocs.io

  • Pluggy: The plugin management system used by pytest, allowing the definition of hook specifications and the implementation of plugins that can extend or modify the behavior of a host program.

    Website: https://pluggy.readthedocs.io

  • Aspectlib: An aspect-oriented programming library for Python, which helps in separating concerns in a program, similar to middleware.

    Website: https://python-aspectlib.readthedocs.io/en/latest/

  • Events: A lightweight event library for adding hooks (event handlers) to Python code, implementing the observer pattern.

    GitHub: https://github.com/pyeve/events

  • Blinker: Provides support for Signals and Slots, a callback mechanism that can be used in various applications.

    Website: https://blinker.readthedocs.io/en/stable/

Edit: fixed 2 of the links. I've not used any of these modules before, but having skimmed through their docs:

wrapt, pluggy, aspectlib, and blinker all look awesome. "Events" sounds not so great. I'm not suggesting we need to use an existing lib but it should be seriously considered. I'm sure a lot of problems have been solved in them. Some of these seem quite battle-tested.

bitnom avatar Jan 15 '24 12:01 bitnom

With this, we could add @hookable_method to https://github.com/microsoft/autogen/blob/main/autogen/oai/client.py#L416 to hook up what coming to and from LLM, right?...

ref: https://github.com/microsoft/autogen/pull/1146

kittenkill avatar Jan 15 '24 12:01 kittenkill

With this, we could add @hookable_method to https://github.com/microsoft/autogen/blob/main/autogen/oai/client.py#L416 to hook up what coming to and from LLM, right?...

ref: #1146

for sure. We could actually decorate literally every class/func/method in autogen to support the hooks system, which is what I think should be done.

note: This PR feels like it could take a while to be feature-complete. There is already a simpler PR ahead of this one which allows a custom openai wrapper: https://github.com/microsoft/autogen/pull/1217

bitnom avatar Jan 15 '24 13:01 bitnom

Yes, i saw that other PR too. I like, IMHO, this generic hook aproach as lets you decouple logic from the base-code.

Maybe, one would even ENABLE_HOOKS="logging,other_ones", with already basic built-in hooks so users can just enable them like flags for common things.

kittenkill avatar Jan 15 '24 13:01 kittenkill

Yes, i saw that other PR too. I like, IMHO, this generic hook aproach as lets you decouple logic from the base-code.

Maybe, one would even ENABLE_HOOKS="logging,other_ones", with already basic built-in hooks so users can just enable them like flags for common things.

really now I feel like subclassing and using one of those modules I listed on all the things.

bitnom avatar Jan 15 '24 14:01 bitnom

Not coming from strong Python background so don't want to overly stress my opinion here. Still - As I read this PR conversations, and hanging around the relevant existing code in the repo for a while I feel like "override anything you want" approach would generate a problem in the other directions.

Here is how I see it - at the moment even for replacing a simple reply you have to work pretty hard - but at least it's clear what are the customizable parts of an agent. Or in other words, when does an "agent" is no longer a agent.

Basically, I love this PR direction of using an already established method for extending and mixing in new behavior. I also all for deduplicating the async specific code as much as possible. But - there should always be a clear vision of what are the ways this could be done and in which points. Otherwise maintaining a strong backward compatible API (which means user base can keep upgrading without forking, abandoning) will be almost impossible. Same goes for documentation - showing 100 possible ways of doing the same thing would confuse new users and reduce adoption rate.

I would suggest:

  • Recognize from known use cases in the wild how do users WANT to extend and customize the library today. (not in a distant future)
  • Design this extensibility looking into a future of endless variations on the above use cases (not other use cases)
  • Look for an established Pythonian way this shoudl be done
  • Hopefully find a backward compatible way of weaving that in without breaking exiting API

Hope this help, thank you so much for your work!

yoadsn avatar Jan 15 '24 15:01 yoadsn

Not coming from strong Python background so don't want to overly stress my opinion here. Still - As I read this PR conversations, and hanging around the relevant existing code in the repo for a while I feel like "override anything you want" approach would generate a problem in the other directions.

Here is how I see it - at the moment even for replacing a simple reply you have to work pretty hard - but at least it's clear what are the customizable parts of an agent. Or in other words, when does an "agent" is no longer a agent.

Basically, I love this PR direction of using an already established method for extending and mixing in new behavior. I also all for deduplicating the async specific code as much as possible. But - there should always be a clear vision of what are the ways this could be done and in which points. Otherwise maintaining a strong backward compatible API (which means user base can keep upgrading without forking, abandoning) will be almost impossible. Same goes for documentation - showing 100 possible ways of doing the same thing would confuse new users and reduce adoption rate.

I would suggest:

  • Recognize from known use cases in the wild how do users WANT to extend and customize the library today. (not in a distant future)
  • Design this extensibility looking into a future of endless variations on the above use cases (not other use cases)
  • Look for an established Pythonian way this shoudl be done
  • Hopefully find a backward compatible way of weaving that in without breaking exiting API

Hope this help, thank you so much for your work!

Thank you for the valuable feedback! We need to recognize the fact that most autogen users who want to extend agent capabilities will not have strong python backgrounds. This should not prevent us from adopting clean and powerful extension mechanisms, but the machinery needs to be readable, carefully documented, and easy for almost anyone to use, as well as easy to maintain after all of us have moved on. If we require users to become versed in arcane libraries and abstruse programming patterns, those barriers will severely limit autogen's adoption.

rickyloynd-microsoft avatar Jan 15 '24 15:01 rickyloynd-microsoft

I added support for async functions and for conditions. Now it is easy to create MiddleWare class that would support the functionality needed to replace hooks and reply functions with triggers.

davorrunje avatar Jan 15 '24 16:01 davorrunje

To ground our discussion of refactorization, here's a summary of the hook-based process (introduced by #1091) that was used to refactor teachability into a composable, chainable capability for addition to any agent.

This hook process involves three mechanisms: Capability addition, Hook registration and Hook execution:

Mechanisms Agent objects and methods Capability objects and methods
Capability addition
extends an agent.
The app instantiates an agent, like
agent = ConversableAgent()
or one of its subclasses.
The app instantiates a capability, like
teachability = Teachability() then calls
teachability.add_to_agent(agent).
Hook registration
connects a
hook method to a
hookable method.
ConversableAgent implements
and calls hookable methods, like
ConversableAgent.process_last_message().

ConversableAgent implements
ConversableAgent.register_hook(
      hookable_method, hook).
The capability implements
hook methods, like
Teachability.process_last_message().

Teachability.add_to_agent(agent) calls
agent.register_hook(
      hookable_method, hook).
Hook execution ConversableAgent methods call its
hookable methods, which call
their registered hook methods (if any).
Capability hook methods are called by
their registered hookable methods.

rickyloynd-microsoft avatar Jan 16 '24 02:01 rickyloynd-microsoft

@ekzhu and I came up with the new scheme which introduces a single new function/decorator hookable and implements Middleware pattern. An example of how to use it is as follows:

class A:
    def __init__(self, name: str) -> None:
        self.name = name

    @hookable
    def go(self, *args: Any, **kwargs: Any) -> str:
        return f"{self.name}.{format_function(self.go, *args, **kwargs)}"

class MyMiddleware:
    def __init__(self, name: str) -> None:
        self.name = name

    def call(self, *args: Any, next: Callable[..., Any], **kwargs: Any) -> str:
        retval = next(*args, **kwargs)
        return f"{self.name}.{format_function(self.call, retval)}"

    def trigger(self, *args: Any, **kwargs: Any) -> bool:
        return not ("skip_middleware" in kwargs and kwargs["skip_middleware"])

a = A("a")
add_middleware(A.go, MyMiddleware("mw"))

assert a.go(1, 2, 3, a=4, b=5) == "mw.call(a.go(1, 2, 3, a=4, b=5))"
assert a.go(1, 2, 3, a=4, b=5, skip_middleware=False) == "mw.call(a.go(1, 2, 3, a=4, b=5, skip_middleware=False))"
assert a.go(1, 2, 3, a=4, b=5, skip_middleware=True) == "a.go(1, 2, 3, a=4, b=5, skip_middleware=True)"

add_middleware(A.go, MyMiddleware("MW"))
assert a.go(1, 2, 3, a=4, b=5) == "mw.call(MW.call(a.go(1, 2, 3, a=4, b=5)))"

There can be more than one hookable method in each class. We can use this to implement reply and hook functions and probably many other things.

davorrunje avatar Jan 16 '24 16:01 davorrunje

@tyler-suard-parker @joshkyh @bitnom @jackgerrits @rickyloynd-microsoft

You are welcome to try out this branch. We are currently working on replacing register_hook for now then we will move on to refactor existing generate_***_reply functions in the ConversableAgent class into a middleware and upgrade the generate_reply method to use the middleware -- so the current functionalities stay the same.

But we need someone to think about how to implement some of these new features using middleware, and add them to the generate_reply to enable new functionalities.

  1. Incoming and outgoing message streaming to web socket
  2. Incoming and outgoing message logging
  3. RAG-style message context expansion, e.g., retrieve relevant context from a vector database and expand the incoming message's content.
  4. Human middleware to short-circuit the rest of the middleware pipeline -- think a static file middleware in a web framework.
  5. Out-going message filtering. e.g., filtering api keys, passwords, etc.

ekzhu avatar Jan 17 '24 08:01 ekzhu

I like this PR and the idea behind it. IMO it's another good example of composition over inheritance and it enables user to extend an agent's ability without the necessary of creating a new type of agent.

Below are some questions

  • How to short-cut middleware, is that simply don't call next() function in middleware
  • Is there a way to revert h after calling add_middleware(h, mv, ...)

Also one feedback for add_middleware(h, ...) function. My two cents here is it would be better for add_middleware to return a new h with middleware registered rather than modify h directly so that it won't change the behaviors of previous callers to h.

LittleLittleCloud avatar Jan 18 '24 01:01 LittleLittleCloud

@LittleLittleCloud thanks! Could you provide the feedback inline in the code?

How to short-cut middleware, is that simply don't call next() function in middleware

Right. Once you return it hands control back to the previous middleware.

Is there a way to revert h after calling add_middleware(h, mv, ...)

The interface is in draft, but I believe you can easily reset the middleware from scratch if you choose to remove one of them.

ekzhu avatar Jan 18 '24 01:01 ekzhu

I like this PR and the idea behind it. IMO it's another good example of composition over inheritance and it enables user to extend an agent's ability without the necessary of creating a new type of agent.

Below are some questions

  • How to short-cut middleware, is that simply don't call next() function in middleware

Yes

  • Is there a way to revert h after calling add_middleware(h, mv, ...)

Yes, there is set_middleware functions, but I'll also include few more like a pop, replace, etc.

Also one feedback for add_middleware(h, ...) function. My two cents here is it would be better for add_middleware to return a new h with middleware registered rather than modify h directly so that it won't change the behaviors of previous callers to h.

You are absolutely right, I will change the add_middleware to be specific to an instance, not the class:

  add_middleware(
      agent.process_last_message_user_text,
      ProcessLastMessageMiddleware(agent=agent, teachability=self),
  )

davorrunje avatar Jan 18 '24 08:01 davorrunje

Update 01/18/2024

Middleware registration methods add_middleware and set_middleware are refactored to be attached to bounded methods as suggested by @LittleLittleCloud and @ekzhu.

class A:
    def __init__(self, name: str) -> None:
        self.name = name

    @register_for_middleware
    def process_message(self, msg: str, skip_middleware: Optional[bool] = None) -> str:
        return f"{self.name}.process_message({msg=})"

class MyMiddleware:
    def __init__(self, name: str) -> None:
        self.name = name

    def call(self, *args: Any, next: Callable[..., Any], **kwargs: Any) -> str:
        retval = next(*args, **kwargs)
        return f"{self.name}.{format_function(self.call, retval)}"

    def trigger(self, *args: Any, **kwargs: Any) -> bool:
        return not ("skip_middleware" in kwargs and kwargs["skip_middleware"])

a = A("a")
mw = MyMiddleware("mw")

# middleware attached to a bounded method a.process_message
add_middleware(a.process_message, mw)

assert a.process_message("hello") == "mw.call(a.process_message(msg='hello'))"

mw2 = MyMiddleware("mw2")
add_middleware(a.process_message, mw2)
assert a.process_message("hello") == "mw.call(mw2.call(a.process_message(msg='hello')))"

b = A("b")
with pytest.raises(ValueError):
    # mw is already attached to a.process_message
    add_middleware(b.process_message, mw)

mwb = MyMiddleware("mwb")
add_middleware(b.process_message, mwb)

# only mwb middleware is called on calling b.process_message
assert b.process_message("hello") ==  "mwb.call(b.process_message(msg='hello'))"

# only mw and mw2 are called on a.process_message
assert a.process_message("hello") ==  "mw.call(mw2.call(a.process_message(msg='hello')))"

davorrunje avatar Jan 18 '24 16:01 davorrunje

@ekzhu @davorrunje @LittleLittleCloud Let's keep in mind also that because AutoGen is easy for beginners to use, it would be great if the middlewares are easy for beginners to understand and use as well. Maybe some tutorials or examples for specific applications?

tyler-suard-parker avatar Jan 18 '24 19:01 tyler-suard-parker

A framework should be difficult to write and easy to use :) This approach with middleware patterns has been proved to be very successful in Starlette and FastAPI. Of course, we need to write documentation and provide many examples. Again, FastAPI docs are a very good example on how to do it. Users don't really understand all the magic we do behind the scene, it just works as expected.

davorrunje avatar Jan 18 '24 20:01 davorrunje

Update 01/19/2024

The code was internally refactored so it accurately uses the signature of a decorated function in call() methods of a MIddleware class. Another change is adding the a_call method to Middleware classes and removing of trigger method. Having both call() and a_call allows for the most efficient implementation. Decorators for automatically generating call from a_call and vice versa will be added shortly so we'll be still able to mix sync/async styles if we are willing to pay the price in reduced performance. Examples of Middleware classes were added to the ConversableAgent. Here is a simple one performing logging:

# notice that the `call`` signature must match the function decorated with `register_for_middleware`:
# passing arguments to call() functions must the the same as passing arguments
# to generate_reply() apart from next being passed as a keyword argument
# default values must also be the same
class _PrintReplyMiddleware:
    def __init__(self, agent: Agent):
        self._agent = agent

    def call(
        self,
        messages: Optional[List[Dict]] = None,
        sender: Optional[Agent] = None,
        # next will be passed as a keyword argument
        next: Optional[Callable[..., Any]] = None,
    ) -> Tuple[bool, Optional[str]]:
        print(f"generate_reply() called: {sender} sending {messages[-1] if messages else messages}'")
        retval = next(messages, sender)
        return retval
    
    async def a_call(
        self,
        messages: Optional[List[Dict]] = None,
        sender: Optional[Agent] = None,
        next: Optional[Callable[..., Any]] = None,
    ) -> Tuple[bool, Optional[str]]:
        print(f"a_generate_reply() called: {sender} sending {messages[-1] if messages else messages}'")
        retval = await next(messages, sender)
        return retval

class ConversableAgent(Agent):
    def __init__(self, *args, **kwargs):
        ...
        # attaching middleware to a registered method 
        add_middleware(self.generate_reply, _PrintReplyMiddleware(self))
        add_middleware(self.a_generate_reply, _PrintReplyMiddleware(self))

    @register_for_middleware
    def generate_reply(
        self,
        messages: Optional[List[Dict]] = None,
        sender: Optional[Agent] = None,
        exclude: Optional[List[Callable]] = None,
    ) -> Union[str, Dict, None]:
        ...

    @register_for_middleware
    async def a_generate_reply(
        self,
        messages: Optional[List[Dict]] = None,
        sender: Optional[Agent] = None,
        exclude: Optional[List[Callable]] = None,
    ) -> Union[str, Dict, None]:
        ...

All the tests are passing and we are ready to start refactoring the ConversableAgent class.

davorrunje avatar Jan 19 '24 21:01 davorrunje

Update 01/20/2024

Created the following middleware:

  1. ToolUseMiddleware
  2. LLMMiddleware
  3. CodeExecutionMiddleware
  4. TerminationAndHumanReplyMiddleware
  5. MessageStoreMiddleware
  6. TeachabilityMiddleware

See autogen/middleware and contrib/capability/teachability.py

Refactored ConversableAgent by composing it using the middleware above. All public methods are backward-compatible.

Fixed some tests. The failing tests should be easy to fix.

Next step:

  1. Use wrapper to unify sync and async code path.
  2. Utilities for building middleware chain and validating call(...) method signatures.
  3. Fix all tests.
  4. Update code-level documentation, remove recommendation for subclassing.

ekzhu avatar Jan 21 '24 08:01 ekzhu

Update 01/26/2024

Async/sync mixing works now in all cases

  • Function and tool calling is working in all combinations of async/sync calls.
  • Code execution works in a_initialize_chat now.

Quality improvements

All tests are passing now. Code coverage was significantly improved with the goal of having over 90% code covered by tests. Type annotations are fixed and mypy reports no errors in autogen/agentchat/middleware and test/agentchat/middleware folders.

autogen/agentchat/middleware/base.py                                6      0      0      0   100%
autogen/agentchat/middleware/code_execution.py                    108      0     36      0   100%
autogen/agentchat/middleware/llm.py                               143      4     62      4    95%
autogen/agentchat/middleware/message_store.py                     113      5     60      9    92%
autogen/agentchat/middleware/termination.py                       127     22     66     14    79%
autogen/agentchat/middleware/tool_use.py                          143      0     54      0   100%

Next steps:

  1. Utilities for building middleware chain and validating call(...) method signatures.
  2. Update code-level documentation, remove recommendation for subclassing.
  3. Write a tutorial on extending ConversibleAgent using middleware instead of subclassing.

davorrunje avatar Jan 26 '24 15:01 davorrunje

I'm going to be honest, even as a mid-level developer, I can't understand a word of the example code. Aren't we making AutoGen easy for everyone to use, regardless of their skill level? The basic Autogen code is fairly simple, set up a model, instantiate an agent, initiate that agent chat. Where do these middlewares fit into that process?

Tylersuard avatar Jan 28 '24 05:01 Tylersuard

Thank you for the valuable feedback! We need to recognize the fact that most autogen users who want to extend agent capabilities will not have strong python backgrounds. This should not prevent us from adopting clean and powerful extension mechanisms, but the machinery needs to be readable, carefully documented, and easy for almost anyone to use, as well as easy to maintain after all of us have moved on. If we require users to become versed in arcane libraries and abstruse programming patterns, those barriers will severely limit autogen's adoption.

Tylersuard avatar Jan 28 '24 05:01 Tylersuard

I'm going to be honest, even as a mid-level developer, I can't understand a word of the example code. Aren't we making AutoGen easy for everyone to use, regardless of their skill level? The basic Autogen code is fairly simple, set up a model, instantiate an agent, initiate that agent chat. Where do these middlewares fit into that process?

Middleware is meant for the framework developer, not for application developer. It is not changing the interface of AutoGen, rather it is changing the backend and how the "under-the-hood" stuff is written.

You can read the PR description about the motivation.

ekzhu avatar Jan 28 '24 06:01 ekzhu

Currently in AutoGen, each incoming message is handled by a pipeline of registered reply functions. Each reply function is triggered by a trigger function. If a reply function is triggered and signaled it is a final reply, it short-circuits the pipeline and returns the generated reply back to the sender.

This design pattern is described in the AutoGen paper (https://arxiv.org/pdf/2308.08155.pdf, Section 2). This PR is a refactor, we convert reply function into middleware class, so it can better handle states like code executors, message history, etc.

ekzhu avatar Jan 28 '24 06:01 ekzhu