adk-python Error 429 RESOURCE_EXHAUSTED when using VertexAiRagRetrieval tool

Describe the bug

I found this bug while making some tests with my diagnostic agent earlier today. I have a Vertex AI RAG corpus to store some schema information which my agent retrieves before making queries. The setup for the tool is as follows:

from google.adk.tools.retrieval.vertex_ai_rag_retrieval import VertexAiRagRetrieval
from vertexai.preview import rag

schema_discovery = VertexAiRagRetrieval(
    name='schema_discovery',
    description=(
        'Use this tool to retrieve osquery table schema documentation,'
    ),
    rag_resources=[
        rag.RagResource(
            rag_corpus=os.environ.get("RAG_CORPORA_URI")
        )
    ],
    similarity_top_k=10,
    vector_distance_threshold=0.6,
)

This code was working until Saturday, but when trying today it doesn't work and returns 429. I was surprised and thought it was really a 429 problem (but as a Googler I have very generous quotas), but I decided to use my old implementation which uses a regular function call instead, and to my surprise it is working normally with no 429 errors:

def discover_schema(search_phrase: str) -> str:
  """Discovers osquery table names and schemas based on a descriptive search phrase.

  Args:
    search_phrase: A phrase describing the kind of information you're looking for. 
      For example: 'user login events' or 'network traffic'.

  Returns:
    Table names and schema information for tables related to the search phrase.
  """
  rag_corpora_uri = os.environ.get('RAG_CORPORA_URI')
  response = rag.retrieval_query(
      rag_resources=[
          rag.RagResource(
              rag_corpus=rag_corpora_uri,
          )
      ],
      text=search_phrase,
  )
  return json.dumps(MessageToDict(response._pb))

I could not find any obvious reasons why calling the vertex ai SDK directly doesn't have any problems, but using the ADK tool has.

To Reproduce

Create a rag corpus in Vertex AI RAG Engine (can be empty)
Create a new agent with adk create agent
Add the following code to agent.py

from google.adk.agents.llm_agent import Agent

from google.adk.tools.retrieval.vertex_ai_rag_retrieval import VertexAiRagRetrieval
from vertexai.preview import rag
import os

rag_query = VertexAiRagRetrieval(
    name='rag_query',
    description=(
        'Use this tool to query the rag,'
    ),
    rag_resources=[
        rag.RagResource(
            rag_corpus=os.environ.get("RAG_CORPORA_URI")
        )
    ],
    similarity_top_k=10,
    vector_distance_threshold=0.6,
)

root_agent = Agent(
    model='gemini-2.5-flash',
    name='root_agent',
    description='A helpful assistant for user questions.',
    instruction='Answer user questions to the best of your knowledge',
    tools=[
        rag_query,
    ]
)

Run adk web and say "hello" to the agent

You should get the 429 error.

This implementation directly calling Vertex AI works perfectly:

from vertexai.preview import rag
import os
import json
from google.protobuf.json_format import MessageToDict

def rag_query(search_phrase: str) -> str:
  """Use this tool to query the rag."""
  rag_corpora_uri = os.environ.get('RAG_CORPORA_URI')
  response = rag.retrieval_query(
      rag_resources=[
          rag.RagResource(
              rag_corpus=rag_corpora_uri,
          )
      ],
      text=search_phrase,
  )
  return json.dumps(MessageToDict(response._pb))

Expected behavior The agent should not error and greet the user

Desktop (please complete the following information):

OS: Linux
Python version(python -V): 3.12.3
ADK version(pip show google-adk): 1.16.0

Model Information:

Are you using LiteLLM: No
Which model is being used: gemini-2.5-flash

Additional context It was working without any errors just two days ago. Downgrading to ADK 1.15.0 didn't solve the issue. Calling Vertex AI RAG directly doesn't have the problem.

Oct 28 '25 15:10 danicat

@danicat Your workaround using rag.retrieval_query() directly is correct - the issue is that VertexAiRagRetrieval adds RAG config to every Gemini 2+ request (even "hello"), likely triggering validation calls that exhaust quota. Your direct API only calls RAG when needed, which is why it works. We'll update the tool to use function declaration for all models to fix this, but your current solution is the right approach for now. Thanks!

Oct 31 '25 19:10 surajksharma07

Thanks for the clarification! I was also confused about why the rag call was being included in a simple "hello". One benefit of the function approach is that it includes the "search query" in the request log, something I wasn't able to see in the current implementation of the tool.

Nov 01 '25 13:11 danicat

@danicat The function approach gives you explicit control over the search query, which makes debugging and monitoring much easier. With the current tool implementation, the query generation happens internally within the model's function calling, so it's not visible in your logs. This is definitely another advantage of your workaround - you get both better quota management and better observability. The planned fix to use function declarations should preserve this visibility while solving the quota issue. Thanks!

Nov 01 '25 14:11 surajksharma07

I've got the 429 too, but I'm not sure what quota I was hitting. Also may I ask what should be the canonical solution i.e. should I use the workaround, or wait for the fix (if had any plan)?

Nov 04 '25 06:11 thipoktham

@thipoktham I recommend you use the workaround until this is fixed in a new release

Nov 04 '25 14:11 danicat

For the 429 error with version adk 1.9 i think there is a built in retry mechanism that can be configured as a interim. here is the gist Agent

from google.adk.models.google_llm import Gemini
from google.genai import types as genai_types

root_agent = Agent(
    model=Gemini(
        model='gemini-2.5-flash',
        retry_options=genai_types.HttpRetryOptions(
            initial_delay=1,
            max_delay=60,
            attempts=5,
            exp_base=2,
            jitter=0.1,
            http_status_codes=[429, 503]
        )
    ),
    name='root_agent',
    description='A helpful assistant for user questions.',
    instruction='Answer user questions to the best of your knowledge',
    tools=[
        rag_query,
    ]
)

Nov 05 '25 06:11 adityapandey216

@adityapandey216 Good catch on the retry mechanism.That's definitely useful as an interim mitigation to handle transient 429s, though it won't solve the underlying issue of unnecessary RAG calls happening on every request.

For anyone hitting this issue, here's the recommended approach:

Best solution: Use the direct rag.retrieval_query() workaround (as @danicat showed) - gives you better quota management and visibility
Interim mitigation: Add retry options (as @adityapandey216 suggested) if you need to stick with the current tool implementation
Long-term: We're looking more into it to use function declarations properly.

Thanks for the helpful suggestion!

Nov 05 '25 08:11 surajksharma07

@xuanyang15 Please have a look into it.

Nov 05 '25 08:11 surajksharma07

calling RAG via custom tool is different from using VertexAiRagRetrieval tool. VertexAiRagRetrieval is a built-in tool, which means it's Model API server that will call the RAG API, and it usually propagate the credential & resource projects ADK used to call Model API ( via env variable GOOGLE_CLOUD_PROJECT and GOOGLE_CLOUD_LOCATION) to RAG API.

While using RAG client directly, is it also using the same project / location ? (you can double check)

Dec 01 '25 18:12 seanzhou1023