autogen
autogen copied to clipboard
[Core] implement redis cache mode
Why are these changes needed?
This adds a redis mode to the cache. This way I can have multiple processes in separate containers running the same application (that is using autogen) and they can share LLM cache (vs in the current disk cache implementation the SQLIte instance ends up being machine local and can't be easily shared across multiple containers/pods).
The actual redis caching is using pickling, same as the disk cache implementation uses. So the cache should be functionally equivalent to the disk cache version.
Docs added inline and then also agent_chat.md:
LLM Caching
Legacy Disk Cache
By default, you can specify a cache_seed
in your llm_config
in order to take advantage of a local DiskCache backed cache. This cache will be used to store the results of your LLM calls, and will be used to return results for the same input without making a call to the LLM. This is useful for saving on compute costs, and for speeding up inference.
assistant = AssistantAgent(
"coding_agent",
llm_config={
"cache_seed": 42,
"config_list": OAI_CONFIG_LIST,
"max_tokens": 1024,
},
)
Setting this cache_seed
param to None
will disable the cache.
Configurable Context Manager
A new configurable context manager allows you to easily turn on and off LLM cache, using either DiskCache or Redis. All LLM agents inside the context manager will use the same cache.
from autogen.cache.cache import Cache
with Cache.redis(cache_seed=42, redis_url="redis://localhost:6379/0") as cache_client:
user.initiate_chat(assistant, message=coding_task, cache_client=cache_client)
with Cache.disk(cache_seed=42, cache_dir=".cache") as cache_client:
user.initiate_chat(assistant, message=coding_task, cache_client=cache_client)
Here's an example of the new integration test running in CI (note I had to setup my fork to get it to run, I think it will only run when it is on main
that is being merged into? - and in my fork the other tests fail due to my OAI_CONFIG_LIST not being correct.
Integration test coverage for the new code I added:
➜ autogen git:(vr/redis_cache) ✗ coverage run -a -m pytest test/agentchat/test_cache.py
=============================================================================================================== test session starts ===============================================================================================================
platform darwin -- Python 3.11.4, pytest-7.4.4, pluggy-1.3.0
rootdir: /Users/vijay/oss/autogen
configfile: pyproject.toml
plugins: Faker-19.3.0, anyio-3.7.1
collected 3 items
test/agentchat/test_cache.py ... [100%]
=============================================================================================================== 3 passed in 47.20s ================================================================================================================
And then per PR feedback I added some unit tests for the cache implementations.
➜ autogen git:(vr/redis_cache) ✗ coverage run -a -m pytest test/cache
=============================================================================================================== test session starts ===============================================================================================================
platform darwin -- Python 3.11.4, pytest-7.4.4, pluggy-1.3.0
rootdir: /Users/vijay/oss/autogen
configfile: pyproject.toml
plugins: Faker-19.3.0, anyio-3.7.1
collected 14 items
test/cache/test_cache.py .... [ 28%]
test/cache/test_disk_cache.py ..... [ 64%]
test/cache/test_redis_cache.py ..... [100%]
=============================================================================================================== 14 passed in 1.24s ================================================================================================================
Related issue number
Checks
- [x] I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
- [x] I've added tests (if relevant) corresponding to the changes introduced in this PR.
- [x] I've made sure all auto checks have passed.
@microsoft-github-policy-service agree company="Regrello"
Codecov Report
Attention: 13 lines
in your changes are missing coverage. Please review.
Comparison is base (
e97b639
) 33.43% compared to head (7e088f9
) 66.70%.
Additional details and impacted files
@@ Coverage Diff @@
## main #1222 +/- ##
===========================================
+ Coverage 33.43% 66.70% +33.26%
===========================================
Files 33 38 +5
Lines 4462 4586 +124
Branches 1043 1123 +80
===========================================
+ Hits 1492 3059 +1567
+ Misses 2847 1212 -1635
- Partials 123 315 +192
Flag | Coverage Δ | |
---|---|---|
unittests | 66.65% <90.07%> (+33.26%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Thanks for the feedback I'll get on those changes over this weekend!
ok @ekzhu all your feedback should be addressed.
First of all, I like this a lot and it is much needed for scaling out!
My only concern is regarding the usage of llm_config
for specifying redis_url
because this has nothing with LLM as such. I think we could use an alternative way of specifying what cache to use by either calling a setup function and/or context manager. Something like this:
- Setup function:
use_redis_cache(url="...")
initiate_chat(...)
- Context manager:
with redis_cache(url="..."):
initiate_chat(...)
In that case I think we'd want to pull out cache_seed as well, which I can see was renamed after openai introduced the seed param, so it's also a bit confusing to be in llm config like it is today (but that becomes much more of a breaking change)
In that case I think we'd want to pull out cache_seed as well, which I can see was renamed after openai introduced the seed param, so it's also a bit confusing to be in llm config like it is today (but that becomes much more of a breaking change)
That's a valid point. It does feel out of place there.
ok @ekzhu all your feedback should be addressed.
Can you join our Discord? we can have more discussion there.
I agree with Davor. Declaring the context for caching separately from the llm_config is a good idea. We may want to refactor llm_config as it starts to feel unmanageable with non-openai models.
I would not worry about forcing a unified configuration interface for all different cache providers, but I would keep all of them under the same entry point. Here is what I mean:
with cache.redis(url="...", **other parameters):
a.initiate_chat(...)
with cache.disk(cache_seed=41, ...):
a.initiate_chat(...)
with cache.sqlite(connection_string="..."):
a.initiate_chat(...)
As you can see all future cache providers can be called from a single entrypoint cache
. My reason for this is that in the future we may want to have other context managers and keeping it hierarchical helps with not confusing them. It also helps with auto-complete.
To avoid breaking backward compatibility we can keep the cache_seed
in the llm_config
for now, and only enable it for diskcache. But for every other cache provider we can use the context manager interface or setup functions (or both!). In the future after we have finished refactoring llm_config
we can then put in deprecation notice.
ok updated per your feedback @ekzhu
LGTM 👍
marking it as draft while I do some full external testing (using my branch in another project and making sure it behaves as I'd expect end to end)
ok great verified locally
def healthcheck(self):
joker_agent = autogen.AssistantAgent(
name="Joker",
llm_config=self.oai_config,
system_message=AgentPrompts.healthcheck()["Joker"],
)
joker_user_proxy = autogen.UserProxyAgent(
"user_proxy",
code_execution_config=False,
system_message=AgentPrompts.healthcheck()["user_proxy"],
human_input_mode="NEVER",
llm_config=self.oai_config,
max_consecutive_auto_reply=1,
)
with Cache.redis() as cache:
joker_user_proxy.initiate_chat(
joker_agent, message=f"Please generate a joke", cache_client=cache
)
jsonm = ""
for m in joker_user_proxy.chat_messages[joker_agent]:
if m["role"] == "user":
jsonm = m["content"]
# try to json parse if it works then break otherwise continue
try:
_ = json.loads(jsonm)
break
except:
continue
return json.loads(jsonm)
works as expected, as does
with Cache.disk() as cache:
joker_user_proxy.initiate_chat(
joker_agent, message=f"Please generate a joke", cache_client=cache
)
as does setting llm_config['cache_seed']
and then some other more complex scenarios (using group chat) that I can't post here, but confirmed to myself they work when wrapped in cache:
with Cache.redis() as cache:
user_proxy.initiate_chat(
manager,
message=f""REDACTED""",
cache_client=cache,
)
Docs were already added see https://github.com/microsoft/autogen/pull/1222/files#diff-623ecb7c953c4e7dc29eb44ac6de40b7a3c7f27929833c732a418bc1b419ec27R288
Very nice 👏 👏 👏
👏👏👏
Ah I think I lost your change when updating from upstream main
Ah I think I lost your change when updating from upstream main
No problem. Re-applied.
@vijaykramesh could you move the tests? Next time you can give me write access to your fork so I can make changes myself from IDE. Doing it from the Github website is quite difficult...
OK cool moving the tests actually greatly simplifies things, now they just will run with the normal build and openai jobs.
build job ran and reported coverage to https://app.codecov.io/github/microsoft/autogen/commit/354157af2f6907668994dd344599ec594d74cd97/tree/autogen/cache
Thanks @vijaykramesh.
@sonichi docs are updated.
@sonichi
- Moved cache to OpenAIWrapper's config argument. So now both the OpenAIWrapper constructor and create() method can use optional cache object to enable caching.
- Added tests for legacy cache and new cache for the OpenAIWrapper class
- Updated docs.
One more thing I realize we need to add: the exit behavior of the cache object. Currently it simply closes the connection to the cache store -- this will cause the cache
object referenced by each agent pointing to a closed cache store. A more graceful behavior should be letting each cache object detach from the agent once the context is exited.
Let me know if this makes sense. @vijaykramesh @sonichi @davorrunje
One more thing I realize we need to add: the exit behavior of the cache object. Currently it simply closes the connection to the cache store -- this will cause the
cache
object referenced by each agent pointing to a closed cache store. A more graceful behavior should be letting each cache object detach from the agent once the context is exited.Let me know if this makes sense. @vijaykramesh @sonichi @davorrunje
Is this specific to the redis cache or also to disckcahe?
Is this specific to the redis cache or also to disckcahe?
Redis cache and disk cache have different behaviors after being closed:
- Attempt to use the Redis cache connection after it has been closed will results in error.
- Attempt to use the DiskCache's Cache after it has been closed will re-open the cache and won't results in error.
So, we will see weird behavior after the context exits for these two caches:
with Cache.redis(..) as cache:
agent.initiate_chat(b, ..., cache=cache)
# This will results in error. as the cache has already been closed.
agent.generate_reply(...)
with Cache.disk(..) as cache:
agent.initiate_chat(b, ..., cache=cache)
# This will be fine, but the cache is still used.
agent.generate_reply(...)
My thinking is that the cache should detaches itself from every agent instances once it exits the with
context.
Update
After testing myself. I was incorrect to say the redis cache will raise an error if attempts to use it after close(). In fact both code above will be fine.
Redis cache and disk cache have different behaviors after being closed: ... My thinking is that the cache should detaches itself from every agent instances once it exits the
with
context.
I agree. This will certainly cause problems in future if not unified.
@sonichi @vijaykramesh @davorrunje
Sorry I was incorrect to say that Redis and DiskCache will have different exit behaviors. I tested myself and there is no difference. Both caches will stay alive and re-opens once you use it again.
I pushed another commit to make sure a_run_chat
has the same handling for cache as run_chat
.