generative-ai-python 503 The service is currently unavailable when using Context caching Feature

Description of the bug:

I'm trying to create a cache by reading the contents of multiple PDF files, but when the total number of tokens within the files exceeds approximately 500,000 tokens, I receive a 503 error (Service Unavailable) from Google API Core.

It seems that the error isn't returning immediately, but rather after about 40 to 50 seconds. This might indicate that a timeout is occurring in Google API Core.

Code

import google.generativeai as genai
import os

gemini_api_key = os.environ.get("GEMINI_API_KEY")
genai.configure(api_key=gemini_api_key)

documents = []
file_list = ["xxx.pdf", "yyy.pdf", ...]
for file in file_list:
  gemini_file = genai.upload_file(path=file, display_name=file)
  documents.append(gemini_file)

gemini_client = genai.GenerativeModel("models/gemini-1.5-flash-001")
total_token = gemini_client.count_tokens(documents).total_tokens)
print(f"total_token: {total_token}")
# total_token: 592403

gemini_cache = genai.caching.CachedContent.create(model=“models/gemini-1.5-flash-001”, display_name=“sample”, contents=documents)

Version

Python 3.9.19
google==3.0.0
google-ai-generativelanguage==0.6.6
google-api-core==2.19.0
google-api-python-client==2.105.0
google-auth==2.29.0
google-auth-httplib2==0.2.0
google-generativeai==0.7.2
googleapis-common-protos==1.63.0

Actual vs expected behavior:

Actual behavior

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 76, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 1176, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 1005, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "The service is currently unavailable."
	debug_error_string = "UNKNOWN:Error received from peer ipv4:172.217.175.234:443 {created_time:"2024-08-06T13:37:03.077186006+09:00", grpc_status:14, grpc_message:"The service is currently unavailable."}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/site-packages/google/generativeai/caching.py", line 219, in create
    response = client.create_cached_content(request)
  File "/usr/local/lib/python3.9/site-packages/google/ai/generativelanguage_v1beta/services/cache_service/client.py", line 874, in create_cached_content
    response = rpc(
  File "/usr/local/lib/python3.9/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 78, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.ServiceUnavailable: 503 The service is currently unavailable.

Expected behavior

gemini_cache = genai.caching.CachedContent.create(model="models/gemini-1.5-flash-001", display_name="sample", contents=documents)
print(gemini_cache)

# CachedContent(
#     name='cachedContents/l5ataay9naq2',
#     model='models/gemini-1.5-flash-001',
#     display_name='sample',
#     usage_metadata={
#         'total_token_count': 592403,
#     },
#     create_time=2024-08-08 01:21:44.925021+00:00,
#     update_time=2024-08-08 01:21:44.925021+00:00,
#     expire_time=2024-08-08 02:21:43.787890+00:00
# )

Any other information you'd like to share?

https://ai.google.dev/gemini-api/docs/caching?lang=python#considerations

The minimum input token count for context caching is 32,768, and the maximum is the same as the maximum for the given model. (For more on counting tokens, see the Token guide).

Upon reviewing the Gemini API documentation, I noticed an interesting mismatch regarding token limits. While the maximum token count is described as being dependent on the specific model in use. In my case, I'm utilizing the models/gemini-1.5-flash-001 model, which has a maximum input token limit of 1,048,576. Based on this information, I initially assumed that processing around 500,000 tokens should be working without any issues.

Moreover, I was able to successfully generate the cache even with token counts exceeding 800,000 when attempting to create a cache using a string. This leads me to suspect that there might be a bug specifically related to creating cache files with high token counts, as opposed to string-based caching.

Aug 08 '24 02:08 okada1220

Im experiencing the same issue even when using models/gemini-1.5-pro-001 and trying to cache roughly 300k tokens even though it has an input token limit of 2,097,152

Aug 11 '24 01:08 gurugecl

@okada1220,

Thank you reporting this issue. This looks like an intermittent error and should work now. Automatic retry logic is added to SDK to avoid these errors and you can follow google-gemini/cookbook#469 FR for examples on retry logic. Thanks

Aug 13 '24 04:08 singhniraj08

@singhniraj08 Thank you for your response.

I checked again, and it seems that the same error is still occurring...

I looked at the retry logic example in google-gemini/cookbook#469, which seems to apply when using request_options withgenerate_content. But since I’m using genai.caching.CachedContent.create, which doesn’t have request_options, I’m wondering if this retry logic is still applicable here. Do you think this approach will work in my case?

Aug 14 '24 05:08 okada1220

I'm receiving this error too

Oct 29 '24 23:10 nate-walter

I'm experiencing the same issue. Context caching with pdf files raise HTTP 503 while directly injecting strings into the cache works. Any update on this issue ?

Jan 15 '25 20:01 Balzard

Also having the exact same issue with a collection of large PDFs.

Jan 24 '25 22:01 sobjornstad

This might be an internal API error because I've gotten 503 using TypeScript.

Jan 28 '25 01:01 BrianHung

I'm also having this issue.

Note that this still doesn't work even if you base64 encode the pdfs.

import os
from pathlib import Path
from base64 import b64encode

import google.generativeai as genai

os.environ["GOOGLE_API_KEY"] = "xxxxxx"

pdf_paths = [Path("1.pdf"), Path("2.pdf"), ...]

encoded_pdfs = [b64encode(path.read_bytes()).decode("utf-8") for path in pdf_paths]
contents = [
    {
        "role": "user",
        "parts": [{"inline_data": {"mime_type": "application/pdf", "data": content}}],
    }
    for content in encoded_pdfs
]

cache = genai.caching.CachedContent.create(model="gemini-1.5-flash-001", contents=contents)

print("Cache created:", cache)

Full error stack trace

Traceback (most recent call last):
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/google/api_core/grpc_helpers.py", line 76, in error_remapped_callable
    return callable_(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/grpc/_interceptor.py", line 277, in __call__
    response, ignored_call = self._with_call(
                             ^^^^^^^^^^^^^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/grpc/_interceptor.py", line 332, in _with_call
    return call.result(), call
           ^^^^^^^^^^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/grpc/_channel.py", line 440, in result
    raise self
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/grpc/_interceptor.py", line 315, in continuation
    response, call = self._thunk(new_method).with_call(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/grpc/_channel.py", line 1198, in with_call
    return _end_unary_response_blocking(state, call, True, None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/grpc/_channel.py", line 1006, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "The service is currently unavailable."
	debug_error_string = "UNKNOWN:Error received from peer ipv4:216.58.212.202:443 {created_time:"2025-01-29T16:49:05.933467+02:00", grpc_status:14, grpc_message:"The service is currently unavailable."}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/adam/Documents/1 Projects/Landstack/document_summary/utils/pdfs.py", line 163, in <module>
    cache = genai.caching.CachedContent.create(model="gemini-1.5-flash-001", contents=contents)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/google/generativeai/caching.py", line 219, in create
    response = client.create_cached_content(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/google/ai/generativelanguage_v1beta/services/cache_service/client.py", line 883, in create_cached_content
    response = rpc(
               ^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/google/api_core/grpc_helpers.py", line 78, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.ServiceUnavailable: 503 The service is currently unavailable.

Jan 29 '25 11:01 codeananda

Update: switching to VertexAI did the trick for me!

Install libraries: pip install vertexai langchain_google_vertexai
Set .env vars
Chunk PDFs below 50MB
Upload to GCS and get the uris
Cache using the vertexai implementation

.env contents

VERTEX_PROJECT_ID=xxxxxx
VERTEX_LOCATION=xxxxxx
VERTEX_CREDENTIAL_PATH=path_to_credentials.json
GOOGLE_APPLICATION_CREDENTIALS=path_to_credentials.json

I've also found that VertexAI 1.5 Pro model can handle many more tokens and requests per minute than AI Studio. Had no retry errors with the former and endless issues with the latter.

from pathlib import Path
from google.cloud import storage
from google.cloud.exceptions import NotFound
from vertexai.generative_models import Part, Content
from vertexai.preview import caching
from langchain_google_vertexai import ChatVertexAI
from dotenv import load_dotenv

load_dotenv()

llm = ChatVertexAI(model_name="gemini-1.5-pro-002")

pdf_paths = [Path("1.pdf"), Path("2.pdf"), ...]

gcs_uris = [
    upload_pdf_to_gcs(pdf_path, 'bucket-name') for pdf_path in pdf_paths
]

parts = [Part.from_uri(uri=uri, mime_type="application/pdf") for uri in gcs_uris]
contents = [Content(role="user", parts=parts)]

cached_pdfs = caching.CachedContent.create(
    model_name=llm.model_name, contents=contents
)

llm.cached_content = cached_pdfs.name

upload_pdf_to_gcs definition

def upload_pdf_to_gcs(
    pdf_path: str | Path,
    bucket_name: str,
    destination_blob_name: str | None = None,
    create_bucket: bool = True,
) -> str:
    """
    Upload a PDF file to Google Cloud Storage and return its URI.

    Parameters
    ----------
    pdf_path : str | Path
        Local path to the PDF file
    bucket_name : str
        Name of the GCS bucket to upload to
    destination_blob_name : str | None
        Name to give the file in GCS. If None, uses the original filename
    create_bucket : bool, default False
        If True, creates the bucket if it doesn't exist
    """
    pdf_path = Path(pdf_path)

    if not destination_blob_name:
        destination_blob_name = str(pdf_path)

    client = storage.Client()

    try:
        bucket = client.get_bucket(bucket_name)
    except NotFound:
        if create_bucket:
            bucket = client.create_bucket(bucket_name)
        else:
            raise

    blob = bucket.blob(destination_blob_name)
    blob.upload_from_filename(str(pdf_path))

    file_uri = f"gs://{bucket_name}/{destination_blob_name}"
    return file_uri

Jan 30 '25 14:01 codeananda

For anyone finding this thread in mid-2025 and wondering if this is still an issue, the answer: Yes. I'm getting many, many 503 errors when uploading two PDFs of ~100 pages each. Tried increasing standoff retry logic. Didn't help. Will be moving to Vertex AI.

Apr 29 '25 10:04 ctg5

Still an issue for me. Today, I was able to create one context (yesterday none) with the demo-file from the docs. But not anymore... no usefull error message, just service not available. When I make the context to small, it gives my my token count and the min count, when I make the context to big, it gives me my token count and the max count. When the context is just right, the service is down....

Apr 29 '25 12:04 sebastian-305

You can see the status of the issue: "Open". So nothing fixed yet. I'm using it with REST API from .NET and have the same issue for larger files. It seems it's a hardcoded 80 seconds timeout for all caching creation.

Apr 29 '25 12:04 ovicrisan

Facing same issue

Apr 30 '25 18:04 devagarwal007

Switching to Vertex AI will help, see my above answer for python implementation

May 01 '25 10:05 codeananda

Switching to Vertex worked for me as well. Here is a Ruby implementation that avoids using the google-cloud-ai_platform gem, which I found to be out of date and doesn't really add much value anyway. Hope it helps.

require 'google/cloud/storage'
require 'googleauth'

class VertexAdapter
  PROJECT_ID = ENV['VERTEX_PROJECT_ID']
  LOCATION = ENV['VERTEX_LOCATION']
  MODEL_NAME = "gemini-2.5-pro-preview-03-25"
  BUCKET_NAME = ENV["GOOGLE_CLOUD_STORAGE_BUCKET"]
  BASE_URL = "https://#{LOCATION}-aiplatform.googleapis.com"
  PARENT = "projects/#{PROJECT_ID}/locations/#{LOCATION}"  
  HTTP_TIMEOUT = 480
  CACHE_TTL = "1800s"

  def initialize
    scope = "https://www.googleapis.com/auth/cloud-platform"
    authorization = Google::Auth.get_application_default(scope)
    token = authorization.fetch_access_token!["access_token"]
    @api_headers = { "Content-Type" => "application/json", "Authorization" => "Bearer #{token}" }
  end

  def upload_file(pdf_content, s3_key)
    Rails.logger.info("Uploading to Vertex (Google Cloud Storage) API")
    
    storage = Google::Cloud::Storage.new(
      project_id: ENV["VERTEX_PROJECT_ID"]
    )

    # Create a valid bucket name following GCS naming conventions
    bucket = storage.bucket(BUCKET_NAME) || storage.create_bucket(BUCKET_NAME)

    # Upload the file with a path that includes the upload job ID
    file = bucket.create_file(
      StringIO.new(pdf_content),
      s3_key,
      content_type: 'application/pdf'
    )

    file.public_url
  end

  def check_cache(cache_id)
    Rails.logger.info("Checking cache for #{cache_id}")
    begin
      endpoint = "#{BASE_URL}/v1/#{PARENT}/cachedContents/#{cache_id}"
      Rails.logger.info("Checking cache for #{endpoint}")
      response = HTTParty.get(endpoint, headers: @api_headers)
      if response.code == 200
        Rails.logger.info("Cache found for #{cache_id}")
        true
      else
        Rails.logger.info("Cache not found for #{cache_id}")
        false
      end
    rescue => e
      # On network or server error, handle the same as a cache miss
      Rails.logger.error("Cache lookup error: #{e.message}")
      false
    end
  end

  def delete_cache(cache_id)
    Rails.logger.info("Deleting cache for #{cache_id}")
    begin
      endpoint = "#{BASE_URL}/v1/#{PARENT}/cachedContents/#{cache_id}"
      response = HTTParty.delete(endpoint, headers: @api_headers)
      if response.code == 200
        Rails.logger.info("Cache deleted for #{cache_id}")
        true
      else
        false
      end
    rescue => e
      # On network or server error, handle the same as a cache miss
      Rails.logger.error("Cache delete error: #{e.message}")
      false
    end
  end

  def cache_file(s3_key)
    system_prompt = File.read(Rails.root.join("lib", "instructions", "system_prompt.md"))
    body = {
      "model": "projects/#{PROJECT_ID}/locations/#{LOCATION}/publishers/google/models/#{MODEL_NAME}",
      "contents":[
        {
          "parts":[
            {"file_data": {"mime_type": "application/pdf", "file_uri": "gs://#{BUCKET_NAME}/#{s3_key}"}},
          ],
          "role": "user"
        }
      ],
      "systemInstruction": {
        "parts": [
          {
            "text": system_prompt
          }
        ],
        "role": "system"
      },
      "ttl": CACHE_TTL
    }
    
    endpoint = "#{BASE_URL}/v1/#{PARENT}/cachedContents"
    response = HTTParty.post(
      endpoint,
      headers: @api_headers,
      body: body.to_json,
      timeout: HTTP_TIMEOUT
    )

    if response.code == 200
      cached_name = JSON.parse(response.body)["name"]
      Rails.logger.debug("Successfully cached as #{cached_name}")
      cached_name
    else
      Rails.logger.error("API Error (#{response.code}): #{response.body}")
      raise RuntimeError, "API Error (#{response.code}): #{response.body}"
    end
  end

  def generate_content(cached_name)
    Rails.logger.info("Generating content from cached context")

    user_prompt = File.read(Rails.root.join("lib", "instructions", "user_prompt.md"))
    
    body = {
      "contents": [
        {
          "role": "user",
          "parts": [
            {
              "text": user_prompt,
            },
          ]
        },
      ],
      "cachedContent": cached_name,
      "generationConfig": {
        "responseMimeType": "text/plain",
      },
    }

    endpoint = "#{BASE_URL}/v1/#{PARENT}/publishers/google/models/#{MODEL_NAME}:generateContent"
    response = HTTParty.post(
      endpoint,
      headers: @api_headers,
      body: body.to_json,
      timeout: HTTP_TIMEOUT
    )

    if response.code == 200
      response_json = JSON.parse(response.body)
      response_json.dig("candidates", 0, "content", "parts", 0, "text")
    else
      Rails.logger.error("API Error (#{response.code}): #{response.body}")
      raise RuntimeError, "API Error (#{response.code}): #{response.body}"
    end
  end
end

May 01 '25 21:05 kmatthews812