google-common google-genai [feature]: Context Caching
Privileged issue
- [X] I am a LangChain maintainer, or was asked directly by a LangChain maintainer to create an issue here.
Issue Content
Gemini now allows a developer to create a context cache with the system instructions, contents, tools, and model information already set, and then reference this context as part of a standard query. It must be explicitly cached (ie - it is not automatic as part of a request or reply) and a cache expiration can be set (and later changed).
It does not appear to be supported in Vertex AI at this time.
Open issues:
- Best paradigm to add to cache or integrate with LangChain history system
- Best paradigm to reference
References:
- AI Studio / genai: https://ai.google.dev/gemini-api/docs/caching?lang=node
- REST: https://ai.google.dev/api/rest/v1beta/cachedContents#CachedContent
- LangChain: https://github.com/langchain-ai/langchain/issues/23259
Will bring this up again with team Python - it seems like an amazing feature
@jacoblee93 Hello, we are students from the University of Toronto trying to contribute and are interested in this feature. Is this feature currently being implemented, otherwise can we take on this one? We also have a few related questions:
- Would you be able to provide any further details on this feature and possibly some use cases (for example, do we want to cache messages, big files or anything else)?
- How do we expect the integration with Langchain history system to behave?
Hey @KevinZJN, sorry for the late response!
The current way to do this looks like this: https://github.com/langchain-ai/langchainjs/blob/main/libs/langchain-anthropic/src/tests/chat_models.int.test.ts#L657
I haven't verified but I think this should generally work the same way where we pass in Google's format, though there may be more work to do. Feel free to take a stab at it!
Hi, @jacoblee93 . We looked into how Google context caching works, and there are a few points to note:
- There is a lower bound for the token limit, so the file we cache needs to be large enough.
- The cache is stored in Google, and a key is provided to access it along with a TTL for the cache. Therefore, we will need to modify the structure to store this information for the model. Do you think if these two factor will break anything we have right now?
Thanks for reply, we will take a look on this!