Error 429 RESOURCE_EXHAUSTED when using VertexAiRagRetrieval tool
Describe the bug
I found this bug while making some tests with my diagnostic agent earlier today. I have a Vertex AI RAG corpus to store some schema information which my agent retrieves before making queries. The setup for the tool is as follows:
from google.adk.tools.retrieval.vertex_ai_rag_retrieval import VertexAiRagRetrieval
from vertexai.preview import rag
schema_discovery = VertexAiRagRetrieval(
name='schema_discovery',
description=(
'Use this tool to retrieve osquery table schema documentation,'
),
rag_resources=[
rag.RagResource(
rag_corpus=os.environ.get("RAG_CORPORA_URI")
)
],
similarity_top_k=10,
vector_distance_threshold=0.6,
)
This code was working until Saturday, but when trying today it doesn't work and returns 429. I was surprised and thought it was really a 429 problem (but as a Googler I have very generous quotas), but I decided to use my old implementation which uses a regular function call instead, and to my surprise it is working normally with no 429 errors:
def discover_schema(search_phrase: str) -> str:
"""Discovers osquery table names and schemas based on a descriptive search phrase.
Args:
search_phrase: A phrase describing the kind of information you're looking for.
For example: 'user login events' or 'network traffic'.
Returns:
Table names and schema information for tables related to the search phrase.
"""
rag_corpora_uri = os.environ.get('RAG_CORPORA_URI')
response = rag.retrieval_query(
rag_resources=[
rag.RagResource(
rag_corpus=rag_corpora_uri,
)
],
text=search_phrase,
)
return json.dumps(MessageToDict(response._pb))
I could not find any obvious reasons why calling the vertex ai SDK directly doesn't have any problems, but using the ADK tool has.
To Reproduce
- Create a rag corpus in Vertex AI RAG Engine (can be empty)
- Create a new agent with
adk create agent - Add the following code to
agent.py
from google.adk.agents.llm_agent import Agent
from google.adk.tools.retrieval.vertex_ai_rag_retrieval import VertexAiRagRetrieval
from vertexai.preview import rag
import os
rag_query = VertexAiRagRetrieval(
name='rag_query',
description=(
'Use this tool to query the rag,'
),
rag_resources=[
rag.RagResource(
rag_corpus=os.environ.get("RAG_CORPORA_URI")
)
],
similarity_top_k=10,
vector_distance_threshold=0.6,
)
root_agent = Agent(
model='gemini-2.5-flash',
name='root_agent',
description='A helpful assistant for user questions.',
instruction='Answer user questions to the best of your knowledge',
tools=[
rag_query,
]
)
- Run
adk weband say "hello" to the agent
You should get the 429 error.
This implementation directly calling Vertex AI works perfectly:
from vertexai.preview import rag
import os
import json
from google.protobuf.json_format import MessageToDict
def rag_query(search_phrase: str) -> str:
"""Use this tool to query the rag."""
rag_corpora_uri = os.environ.get('RAG_CORPORA_URI')
response = rag.retrieval_query(
rag_resources=[
rag.RagResource(
rag_corpus=rag_corpora_uri,
)
],
text=search_phrase,
)
return json.dumps(MessageToDict(response._pb))
Expected behavior The agent should not error and greet the user
Desktop (please complete the following information):
- OS: Linux
- Python version(python -V): 3.12.3
- ADK version(pip show google-adk): 1.16.0
Model Information:
- Are you using LiteLLM: No
- Which model is being used: gemini-2.5-flash
Additional context It was working without any errors just two days ago. Downgrading to ADK 1.15.0 didn't solve the issue. Calling Vertex AI RAG directly doesn't have the problem.
@danicat Your workaround using rag.retrieval_query() directly is correct - the issue is that VertexAiRagRetrieval adds RAG config to every Gemini 2+ request (even "hello"), likely triggering validation calls that exhaust quota. Your direct API only calls RAG when needed, which is why it works.
We'll update the tool to use function declaration for all models to fix this, but your current solution is the right approach for now.
Thanks!
Thanks for the clarification! I was also confused about why the rag call was being included in a simple "hello". One benefit of the function approach is that it includes the "search query" in the request log, something I wasn't able to see in the current implementation of the tool.
@danicat The function approach gives you explicit control over the search query, which makes debugging and monitoring much easier. With the current tool implementation, the query generation happens internally within the model's function calling, so it's not visible in your logs. This is definitely another advantage of your workaround - you get both better quota management and better observability. The planned fix to use function declarations should preserve this visibility while solving the quota issue. Thanks!
I've got the 429 too, but I'm not sure what quota I was hitting. Also may I ask what should be the canonical solution i.e. should I use the workaround, or wait for the fix (if had any plan)?
@thipoktham I recommend you use the workaround until this is fixed in a new release
For the 429 error with version adk 1.9 i think there is a built in retry mechanism that can be configured as a interim. here is the gist Agent
from google.adk.models.google_llm import Gemini
from google.genai import types as genai_types
root_agent = Agent(
model=Gemini(
model='gemini-2.5-flash',
retry_options=genai_types.HttpRetryOptions(
initial_delay=1,
max_delay=60,
attempts=5,
exp_base=2,
jitter=0.1,
http_status_codes=[429, 503]
)
),
name='root_agent',
description='A helpful assistant for user questions.',
instruction='Answer user questions to the best of your knowledge',
tools=[
rag_query,
]
)
@adityapandey216 Good catch on the retry mechanism.That's definitely useful as an interim mitigation to handle transient 429s, though it won't solve the underlying issue of unnecessary RAG calls happening on every request.
For anyone hitting this issue, here's the recommended approach:
- Best solution: Use the direct
rag.retrieval_query()workaround (as @danicat showed) - gives you better quota management and visibility - Interim mitigation: Add retry options (as @adityapandey216 suggested) if you need to stick with the current tool implementation
- Long-term: We're looking more into it to use function declarations properly.
Thanks for the helpful suggestion!
@xuanyang15 Please have a look into it.
calling RAG via custom tool is different from using VertexAiRagRetrieval tool. VertexAiRagRetrieval is a built-in tool, which means it's Model API server that will call the RAG API, and it usually propagate the credential & resource projects ADK used to call Model API ( via env variable GOOGLE_CLOUD_PROJECT and GOOGLE_CLOUD_LOCATION) to RAG API.
While using RAG client directly, is it also using the same project / location ? (you can double check)