cookbook Cannot send cached content to a batch job.

Description of the feature request:

I already posted this on the forum but didn't get any response.

I was playing around the Gemini api's context caching and batch requests in order to put something together for project 4 of this years DeepMind AI's GSoC projects.

However, I can't find a way to pass any reference of the created cache to the batch request. Currently what I'm doing is, create a cache and put that cache name in the input.jsonl and upload it to GCS and point this URI in my batch job. However, batch job gives me some default answers(Ones that are irrelevant to my question).

Once I go view the predictions.jsonl from my storage, I get to see the error.

This is the input.jsonl

{"request": {"contents": [{"role": "user", "parts": [{"text": "What is a turing machine?"}]}], "generationConfig": {"temperature": 0.4}, "tools": [{"cachedContent": "cachedContents/ckhjh0c70hgw"}]}}
{"request": {"contents": [{"role": "user", "parts": [{"text": "What is the main topic of the video that is in the cache context given to you?"}]}], "generationConfig": {"temperature": 0.4}, "tools": [{"cachedContent": "cachedContents/ckhjh0c70hgw"}]}}

I ger irrelevant responses in my terminal like

0        2025-03-07 20:49:51.165000+00:00      {'contents': [{'parts': [{'file_data': None, 'text': 'List objects in this image.'}, {'file_data': {'file_uri': 'gs://cloud-samples-data/generative-ai/image/office-desk.jpeg', 'mime_type': 'image/jpeg'}, 'text': None}], 'role': 'user'}], 'generationConfig': {'temperature': 0.4}}  {'candidates': [{'avgLogprobs': -0.181431194521346, 'content': {'parts': [{'text': 'Here are the objects in the image:\n- Globe\n- Eiffel Tower\n- Airplane\n- Tablet\n- Shopping cart\n- Present\n- Coffee cup\n- Keyboard\n- Mouse\n- Passport\n- Sunglasses\n- Money\n- Notebook\n- Pen'}], 'role': 'model'}, 'finishReason': 'STOP'}], 'createTime': '2025-03-07T20:49:51.453664Z', 'modelVersion': 'gemini-2.0-flash-001@default', 'responseId': '71vLZ6DYG6mUhMIPyqGIkAs', 'usageMetadata': {'candidatesTokenCount': 53, 'candidatesTokensDetails': [{'modality': 'TEXT', 'tokenCount': 53}], 'promptTokenCount': 1812, 'promptTokensDetails': [{'modality': 'IMAGE', 'tokenCount': 1806}, {'modality': 'TEXT', 'tokenCount': 6}], 'totalTokenCount': 1865}}    -0.181431         STOP  [{'text': 'Here are the objects in the image:
- Globe
- Eiffel Tower
- Airplane
- Tablet
- Shopping cart
- Present
- Coffee cup
- Keyboard
- Mouse
- Passport
- Sunglasses
- Money
- Notebook
- Pen'}]

In prediction.jsonl, I see the 400 error.

{"status":"Bad Request: {\"error\": {\"code\": 400, \"message\": \"Invalid JSON payload received. Unknown name \\\"cachedContent\\\" at 'tools[0]': Cannot find field.\", \"status\": \"INVALID_ARGUMENT\", \"details\": [{\"@type\": \"type.googleapis.com/google.rpc.BadRequest\", \"fieldViolations\": [{\"field\": \"tools[0]\", \"description\": \"Invalid JSON payload received. Unknown name \\\"cachedContent\\\" at 'tools[0]': Cannot find field.\"}]}]}}","processed_time":"2025-03-13T18:58:33.522+00:00","request":{"contents":[{"parts":[{"text":"What is a turing machine?"}],"role":"user"}],"generationConfig":{"temperature":0.4},"tools":[{"cachedContent":"cachedContents/ckhjh0c70hgw"}]},"response":{}}
{"status":"Bad Request: {\"error\": {\"code\": 400, \"message\": \"Invalid JSON payload received. Unknown name \\\"cachedContent\\\" at 'tools[0]': Cannot find field.\", \"status\": \"INVALID_ARGUMENT\", \"details\": [{\"@type\": \"type.googleapis.com/google.rpc.BadRequest\", \"fieldViolations\": [{\"field\": \"tools[0]\", \"description\": \"Invalid JSON payload received. Unknown name \\\"cachedContent\\\" at 'tools[0]': Cannot find field.\"}]}]}}","processed_time":"2025-03-13T18:58:33.522+00:00","request":{"contents":[{"parts":[{"text":"What is the main topic of the video that is in the cache context given to you?"}],"role":"user"}],"generationConfig":{"temperature":0.4},"tools":[{"cachedContent":"cachedContents/ckhjh0c70hgw"}]},"response":{}}

No mentor is yet assigned for the task so any help is appreciated, thanks!

What problem are you trying to solve with this feature?

Being able to send a reference of cached content to batch jobs.

Any other information you'd like to share?

No response

Mar 14 '25 09:03 mehedikhan72

Hi @mehedikhan72, here's my approach for using Gemini to handle context caching and batching requests asynchronously. Detailed code is available in this PR

async def batch_predict_async(prompts, cached_context, history, max_concurrent=5):
    semaphore = asyncio.Semaphore(max_concurrent)
    results = []
    
    async def sem_call(prompt):
        async with semaphore:
            result = await call_gemini_api(prompt, cached_context, history)
            history.append(result)
            return result
    
    tasks = [asyncio.create_task(sem_call(p)) for p in prompts]
    for task in asyncio.as_completed(tasks):
        results.append(await task)
    return results

In your current code snippet, it appears you're placing cachedContent incorrectly under the tools field. The API does not recognize cachedContent under the tools array, resulting in the "Unknown name 'cachedContent'" error. I think the correct JSON structure to use Google's Gemini API with cached context might be:

{
  "contents": [
    {
      "role": "user",
      "parts": [{"text": "What is a Turing machine?"}]
    }
  ],
  "generationConfig": {
    "temperature": 0.4
  },
  "cachedContent": "cachedContents/ckhjh0c70hgw"
}

Since I can't see your full code, this is just my suspection, you can find more details about caching in this their offcial documents

Hope it helps!

Mar 14 '25 14:03 william-Dic

Hi, @william-Dic . I'm trying the doc's way of doing batch requests, except, they don't do it with caches.

And, in your code, you're making concurrent calls to Gemini api right? That doesn't really solve the issue as I understand it. You should do a batch request(one api call) instead of multiple api calls, for multiple queries. Correct me if I'm wrong.

Also, I've tried it without the 'tools' as you suggested and it's still the same.

Mar 14 '25 15:03 mehedikhan72

Hi, @mehedikhan72 . Try this JSON Format

{"request": {"contents": [...], "generationConfig": {...}, "tools": [{"googleCachedContent": {"cachedContentId": "cachedContents/ckhjh0c70hgw"}}]}}

Mar 15 '25 07:03 Bhavesh2k4

Hi @mehedikhan72 , I see in the context caching documentation that the supported models for context caching is: Stable versions of Gemini 1.5 Flash Stable versions of Gemini 1.5 Pro And in the documentation of batch processing the supported models are : Gemini 2.0 Flash Gemini 2.0 Flash-Lite

So I don't think the api supports appending the cache into the batches.

Apr 04 '25 18:04 ssary

Hey, Please note that in the Gemini API, context caching is now enabled by default for batch requests. It is not something you can configure manually. For more details, please refer to the technical documentation.

Thanks!

Aug 12 '25 06:08 Gunand3043

Marking this issue as stale since it has been open for 14 days with no activity. This issue will be closed if no further activity occurs.

Aug 26 '25 22:08 github-actions[bot]

This issue was closed because it has been inactive for 27 days. Please post a new issue if you need further assistance. Thanks!

Sep 09 '25 22:09 github-actions[bot]