cookbook icon indicating copy to clipboard operation
cookbook copied to clipboard

[Feature Request] Support for configuring retrieval count (rankLimit/top_k) in File Search

Open Bochyn opened this issue 1 month ago • 3 comments

Description of the feature request:

I would like to request a configuration option within types.FileSearch (or the tool_config definition) to specify the maximum number of chunks/documents retrieved from the File Search Store during a generation call.

Currently, the retrieval limit appears to be a fixed internal default (likely around 5-10 chunks). I propose adding a parameter like rank_limit, top_k, or a retrieval_config object that allows developers to increase this number (e.g., to 20, 50, or more).

What problem are you trying to solve with this feature?

I am working with a File Search Store containing a large number of documents that share high semantic overlap (e.g., similar terminology across different files).

In scenarios where a user's query is slightly generic, the specific target document often ranks lower in the semantic search results (e.g., at position 12 or 15), while the top slots are filled with related but strictly less relevant documents.

Because the current SDK/API restricts the retrieval window to a small default number, the relevant document is cut off ("rank truncation"), and the model hallucinates or states that the information is missing, even though it exists in the store. Increasing the retrieval window is a standard feature in most RAG architectures to handle such "diluted signal" scenarios.

Any other information you'd like to share?

Here is a hypothetical example of how this configuration could look in the Python SDK:

tools=[
    types.Tool(
        file_search=types.FileSearch(
            file_search_store_names=[store.name],
            # PROPOSED PARAMETER:
            retrieval_config={
                "max_retrieved_chunks": 25 
            }
        )
    )
]

This feature would greatly enhance the robustness of Gemini's RAG capabilities for complex knowledge bases.

Bochyn avatar Nov 20 '25 22:11 Bochyn

$:5000

Alexisloyolamend3z avatar Nov 21 '25 11:11 Alexisloyolamend3z

$:5000

What does it mean?

Bochyn avatar Nov 23 '25 00:11 Bochyn

Good feedback - thanks! (b/463199395 for googlers)

markmcd avatar Nov 24 '25 03:11 markmcd

is this issue still open, I would love to work on this @markmcd @Giom-V

ishaanxgupta avatar Dec 15 '25 14:12 ishaanxgupta

hey @markmcd! i wanted to pick up some issues to implement and i was checking out the relevant code for this. within the repository for the sdk, i found this definition for topK within the file search tool definition

top_k: Optional[int] = Field(
      default=None,
      description="""The number of file search retrieval chunks to retrieve.""",
  )

is this already the implementation for the change requested or is there still something to change to add this?

tanish1729 avatar Dec 15 '25 21:12 tanish1729

Yeah it does appear to have already been added - feel free to try it out and send a PR for the cookbook if it works.

markmcd avatar Dec 16 '25 04:12 markmcd

Hi @markmcd and @Giom-V ,

I was referring to the official Python SDK definition here: https://googleapis.github.io/python-genai/genai.html#genai.types.FileSearch.top_k

FileSearch already exposes a top_k parameter, and when I tested this locally, setting top_k did affect the number of retrieved chunks (e.g., with top_k=N, only N chunks were returned).

Given this, it seems the requested functionality may already be available via the existing top_k field. If there’s additional behavior expected beyond this (e.g. different defaults, wiring in other SDKs, or documentation gaps), please let me know

Image Image

When top_k was set to 1 only 1 chunk was retrieved

Image

ved015 avatar Dec 16 '25 08:12 ved015

Thanks everyone! The cookbook is now updated with the new field.

markmcd avatar Dec 16 '25 09:12 markmcd