autogen [Core] implement redis cache mode

Why are these changes needed?

This adds a redis mode to the cache. This way I can have multiple processes in separate containers running the same application (that is using autogen) and they can share LLM cache (vs in the current disk cache implementation the SQLIte instance ends up being machine local and can't be easily shared across multiple containers/pods).

The actual redis caching is using pickling, same as the disk cache implementation uses. So the cache should be functionally equivalent to the disk cache version.

Docs added inline and then also agent_chat.md:

LLM Caching

Legacy Disk Cache

By default, you can specify a cache_seed in your llm_config in order to take advantage of a local DiskCache backed cache. This cache will be used to store the results of your LLM calls, and will be used to return results for the same input without making a call to the LLM. This is useful for saving on compute costs, and for speeding up inference.

assistant = AssistantAgent(
    "coding_agent",
    llm_config={
        "cache_seed": 42,
        "config_list": OAI_CONFIG_LIST,
        "max_tokens": 1024,
    },
)

Setting this cache_seed param to None will disable the cache.

Configurable Context Manager

A new configurable context manager allows you to easily turn on and off LLM cache, using either DiskCache or Redis. All LLM agents inside the context manager will use the same cache.

from autogen.cache.cache import Cache

with Cache.redis(cache_seed=42, redis_url="redis://localhost:6379/0") as cache_client:
    user.initiate_chat(assistant, message=coding_task, cache_client=cache_client)

with Cache.disk(cache_seed=42, cache_dir=".cache") as cache_client:
    user.initiate_chat(assistant, message=coding_task, cache_client=cache_client)

Here's an example of the new integration test running in CI (note I had to setup my fork to get it to run, I think it will only run when it is on main that is being merged into? - and in my fork the other tests fail due to my OAI_CONFIG_LIST not being correct.

Integration test coverage for the new code I added:

➜  autogen git:(vr/redis_cache) ✗ coverage run -a -m pytest test/agentchat/test_cache.py
=============================================================================================================== test session starts ===============================================================================================================
platform darwin -- Python 3.11.4, pytest-7.4.4, pluggy-1.3.0
rootdir: /Users/vijay/oss/autogen
configfile: pyproject.toml
plugins: Faker-19.3.0, anyio-3.7.1
collected 3 items


test/agentchat/test_cache.py ...                                                                                                                                                                                                            [100%]

=============================================================================================================== 3 passed in 47.20s ================================================================================================================

And then per PR feedback I added some unit tests for the cache implementations.

➜  autogen git:(vr/redis_cache) ✗ coverage run -a -m pytest test/cache
=============================================================================================================== test session starts ===============================================================================================================
platform darwin -- Python 3.11.4, pytest-7.4.4, pluggy-1.3.0
rootdir: /Users/vijay/oss/autogen
configfile: pyproject.toml
plugins: Faker-19.3.0, anyio-3.7.1
collected 14 items


test/cache/test_cache.py ....                                                                                                                                                                                                               [ 28%]
test/cache/test_disk_cache.py .....                                                                                                                                                                                                         [ 64%]
test/cache/test_redis_cache.py .....                                                                                                                                                                                                        [100%]

=============================================================================================================== 14 passed in 1.24s ================================================================================================================

Related issue number

Checks

[x] I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
[x] I've added tests (if relevant) corresponding to the changes introduced in this PR.
[x] I've made sure all auto checks have passed.

Jan 12 '24 20:01 vijaykramesh

@microsoft-github-policy-service agree company="Regrello"

Jan 12 '24 20:01 vijaykramesh

Codecov Report

Attention: 13 lines in your changes are missing coverage. Please review.

Comparison is base (e97b639) 33.43% compared to head (7e088f9) 66.70%.

Files	Patch %	Lines
autogen/agentchat/groupchat.py	50.00%	6 Missing and 2 partials :warning:
autogen/cache/cache.py	88.88%	2 Missing and 1 partial :warning:
autogen/cache/cache_factory.py	81.81%	1 Missing and 1 partial :warning:

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1222       +/-   ##
===========================================
+ Coverage   33.43%   66.70%   +33.26%     
===========================================
  Files          33       38        +5     
  Lines        4462     4586      +124     
  Branches     1043     1123       +80     
===========================================
+ Hits         1492     3059     +1567     
+ Misses       2847     1212     -1635     
- Partials      123      315      +192

Flag	Coverage Δ
unittests	`66.65% <90.07%> (+33.26%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Jan 13 '24 05:01 codecov-commenter

Thanks for the feedback I'll get on those changes over this weekend!

Jan 13 '24 16:01 vijaykramesh

ok @ekzhu all your feedback should be addressed.

Jan 13 '24 22:01 vijaykramesh

First of all, I like this a lot and it is much needed for scaling out!

My only concern is regarding the usage of llm_config for specifying redis_url because this has nothing with LLM as such. I think we could use an alternative way of specifying what cache to use by either calling a setup function and/or context manager. Something like this:

Setup function:

use_redis_cache(url="...")
initiate_chat(...)

Context manager:

with redis_cache(url="..."):
  initiate_chat(...)

Jan 13 '24 23:01 davorrunje

In that case I think we'd want to pull out cache_seed as well, which I can see was renamed after openai introduced the seed param, so it's also a bit confusing to be in llm config like it is today (but that becomes much more of a breaking change)

Jan 13 '24 23:01 vijaykramesh

In that case I think we'd want to pull out cache_seed as well, which I can see was renamed after openai introduced the seed param, so it's also a bit confusing to be in llm config like it is today (but that becomes much more of a breaking change)

That's a valid point. It does feel out of place there.

Jan 14 '24 03:01 davorrunje

ok @ekzhu all your feedback should be addressed.

Can you join our Discord? we can have more discussion there.

Jan 14 '24 04:01 ekzhu

I agree with Davor. Declaring the context for caching separately from the llm_config is a good idea. We may want to refactor llm_config as it starts to feel unmanageable with non-openai models.

I would not worry about forcing a unified configuration interface for all different cache providers, but I would keep all of them under the same entry point. Here is what I mean:


with cache.redis(url="...", **other parameters):
   a.initiate_chat(...)


with cache.disk(cache_seed=41, ...):
   a.initiate_chat(...)

with cache.sqlite(connection_string="..."):
   a.initiate_chat(...)

As you can see all future cache providers can be called from a single entrypoint cache. My reason for this is that in the future we may want to have other context managers and keeping it hierarchical helps with not confusing them. It also helps with auto-complete.

To avoid breaking backward compatibility we can keep the cache_seed in the llm_config for now, and only enable it for diskcache. But for every other cache provider we can use the context manager interface or setup functions (or both!). In the future after we have finished refactoring llm_config we can then put in deprecation notice.

Jan 14 '24 05:01 ekzhu

ok updated per your feedback @ekzhu

Jan 14 '24 22:01 vijaykramesh

LGTM 👍

Jan 14 '24 22:01 davorrunje

marking it as draft while I do some full external testing (using my branch in another project and making sure it behaves as I'd expect end to end)

Jan 14 '24 23:01 vijaykramesh

ok great verified locally

 def healthcheck(self):
        joker_agent = autogen.AssistantAgent(
            name="Joker",
            llm_config=self.oai_config,
            system_message=AgentPrompts.healthcheck()["Joker"],
        )

        joker_user_proxy = autogen.UserProxyAgent(
            "user_proxy",
            code_execution_config=False,
            system_message=AgentPrompts.healthcheck()["user_proxy"],
            human_input_mode="NEVER",
            llm_config=self.oai_config,
            max_consecutive_auto_reply=1,
        )

        with Cache.redis() as cache:
            joker_user_proxy.initiate_chat(
                joker_agent, message=f"Please generate a joke", cache_client=cache
            )

        jsonm = ""
        for m in joker_user_proxy.chat_messages[joker_agent]:
            if m["role"] == "user":
                jsonm = m["content"]
                # try to json parse if it works then break otherwise continue
                try:
                    _ = json.loads(jsonm)
                    break
                except:
                    continue
        return json.loads(jsonm)

works as expected, as does

        with Cache.disk() as cache:
            joker_user_proxy.initiate_chat(
                joker_agent, message=f"Please generate a joke", cache_client=cache
            )

as does setting llm_config['cache_seed']

and then some other more complex scenarios (using group chat) that I can't post here, but confirmed to myself they work when wrapped in cache:

        with Cache.redis() as cache:
            user_proxy.initiate_chat(
                manager,
                message=f""REDACTED""",
                cache_client=cache,
            )

Jan 14 '24 23:01 vijaykramesh

Docs were already added see https://github.com/microsoft/autogen/pull/1222/files#diff-623ecb7c953c4e7dc29eb44ac6de40b7a3c7f27929833c732a418bc1b419ec27R288

Jan 15 '24 05:01 vijaykramesh

Very nice 👏 👏 👏

Jan 15 '24 08:01 davorrunje

👏👏👏

Jan 15 '24 21:01 davorrunje

Ah I think I lost your change when updating from upstream main

Jan 16 '24 03:01 vijaykramesh

Ah I think I lost your change when updating from upstream main

No problem. Re-applied.

Jan 16 '24 03:01 ekzhu

@vijaykramesh could you move the tests? Next time you can give me write access to your fork so I can make changes myself from IDE. Doing it from the Github website is quite difficult...

Jan 17 '24 20:01 ekzhu

OK cool moving the tests actually greatly simplifies things, now they just will run with the normal build and openai jobs.

build job ran and reported coverage to https://app.codecov.io/github/microsoft/autogen/commit/354157af2f6907668994dd344599ec594d74cd97/tree/autogen/cache

Jan 18 '24 06:01 vijaykramesh

Thanks @vijaykramesh.

Jan 18 '24 07:01 ekzhu

@sonichi docs are updated.

Jan 18 '24 20:01 ekzhu

@sonichi

Moved cache to OpenAIWrapper's config argument. So now both the OpenAIWrapper constructor and create() method can use optional cache object to enable caching.
Added tests for legacy cache and new cache for the OpenAIWrapper class
Updated docs.

Jan 19 '24 07:01 ekzhu

One more thing I realize we need to add: the exit behavior of the cache object. Currently it simply closes the connection to the cache store -- this will cause the cache object referenced by each agent pointing to a closed cache store. A more graceful behavior should be letting each cache object detach from the agent once the context is exited.

Let me know if this makes sense. @vijaykramesh @sonichi @davorrunje

Jan 19 '24 17:01 ekzhu

One more thing I realize we need to add: the exit behavior of the cache object. Currently it simply closes the connection to the cache store -- this will cause the cache object referenced by each agent pointing to a closed cache store. A more graceful behavior should be letting each cache object detach from the agent once the context is exited.

Let me know if this makes sense. @vijaykramesh @sonichi @davorrunje

Is this specific to the redis cache or also to disckcahe?

Jan 19 '24 19:01 sonichi

Is this specific to the redis cache or also to disckcahe?

Redis cache and disk cache have different behaviors after being closed:

Attempt to use the Redis cache connection after it has been closed will results in error.
Attempt to use the DiskCache's Cache after it has been closed will re-open the cache and won't results in error.

So, we will see weird behavior after the context exits for these two caches:

with Cache.redis(..) as cache:
  agent.initiate_chat(b, ..., cache=cache)

# This will results in error. as the cache has already been closed.
agent.generate_reply(...)

with Cache.disk(..) as cache:
  agent.initiate_chat(b, ..., cache=cache)

# This will be fine, but the cache is still used.
agent.generate_reply(...)

My thinking is that the cache should detaches itself from every agent instances once it exits the with context.

Update

After testing myself. I was incorrect to say the redis cache will raise an error if attempts to use it after close(). In fact both code above will be fine.

Jan 19 '24 20:01 ekzhu

Redis cache and disk cache have different behaviors after being closed: ... My thinking is that the cache should detaches itself from every agent instances once it exits the with context.

I agree. This will certainly cause problems in future if not unified.

Jan 19 '24 22:01 davorrunje

@sonichi @vijaykramesh @davorrunje

Sorry I was incorrect to say that Redis and DiskCache will have different exit behaviors. I tested myself and there is no difference. Both caches will stay alive and re-opens once you use it again.

Jan 20 '24 05:01 ekzhu

I pushed another commit to make sure a_run_chat has the same handling for cache as run_chat.

Jan 20 '24 05:01 ekzhu

autogen autogen copied to clipboard

[Core] implement redis cache mode

Why are these changes needed?

LLM Caching

Legacy Disk Cache

Configurable Context Manager

Related issue number

Checks

Codecov Report

Update

autogen
autogen copied to clipboard