langchainjs google-common google-genai [feature]: Context Caching

Privileged issue

[X] I am a LangChain maintainer, or was asked directly by a LangChain maintainer to create an issue here.

Issue Content

Gemini now allows a developer to create a context cache with the system instructions, contents, tools, and model information already set, and then reference this context as part of a standard query. It must be explicitly cached (ie - it is not automatic as part of a request or reply) and a cache expiration can be set (and later changed).

It does not appear to be supported in Vertex AI at this time.

Open issues:

Best paradigm to add to cache or integrate with LangChain history system
Best paradigm to reference

References:

AI Studio / genai: https://ai.google.dev/gemini-api/docs/caching?lang=node
REST: https://ai.google.dev/api/rest/v1beta/cachedContents#CachedContent
LangChain: https://github.com/langchain-ai/langchain/issues/23259

Jun 21 '24 12:06 afirstenberg

Will bring this up again with team Python - it seems like an amazing feature

Jun 22 '24 01:06 jacoblee93

@jacoblee93 Hello, we are students from the University of Toronto trying to contribute and are interested in this feature. Is this feature currently being implemented, otherwise can we take on this one? We also have a few related questions:

Would you be able to provide any further details on this feature and possibly some use cases (for example, do we want to cache messages, big files or anything else)?
How do we expect the integration with Langchain history system to behave?

Sep 26 '24 01:09 kwei-zhang

Hey @KevinZJN, sorry for the late response!

The current way to do this looks like this: https://github.com/langchain-ai/langchainjs/blob/main/libs/langchain-anthropic/src/tests/chat_models.int.test.ts#L657

I haven't verified but I think this should generally work the same way where we pass in Google's format, though there may be more work to do. Feel free to take a stab at it!

Oct 15 '24 17:10 jacoblee93

Hi, @jacoblee93 . We looked into how Google context caching works, and there are a few points to note:

There is a lower bound for the token limit, so the file we cache needs to be large enough.
The cache is stored in Google, and a key is provided to access it along with a TTL for the cache. Therefore, we will need to modify the structure to store this information for the model. Do you think if these two factor will break anything we have right now?

Thanks for reply, we will take a look on this!

Oct 15 '24 18:10 kwei-zhang