graphrag icon indicating copy to clipboard operation
graphrag copied to clipboard

Add Deterministic Retrieval Mode (Stable Global→Local Routing, No Hops/Planner/Sampling)

Open yuer-dsl opened this issue 1 month ago • 0 comments

Do you need to file an issue?

  • [x] I have searched the existing issues and this feature is not already filed.
  • [ ] My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • [ ] I believe this is a legitimate feature request, not just a question. If this is a question, please use the Discussions area.

Is your feature request related to a problem? Please describe.

📌 Feature Request: Deterministic Retrieval Mode (Stable Global→Local Routing)

Hi team,

I’ve been experimenting with global→local retrieval patterns and built a very small PoC demonstrating a fully deterministic RAG pipeline — no planners, no hops, no sampling, and no hidden randomness.

🔗 Repo (minimal PoC)

https://github.com/yuer-dsl/deterministic-rag-poc

🧪 Example implementation

deterministic_rag_poc.py


🔍 Why this matters

Many RAG systems (including GraphRAG) rely on:

  • Multi-hop reasoning
  • Planner-generated routes
  • Sampling/temperature in intermediate steps

These introduce hidden randomness and make end-to-end reproducibility difficult.

For regulated, audit-sensitive, or high-reliability environments, we often need:

Same corpus + same query → same route → same output.

A deterministic mode can give GraphRAG a high-certainty retrieval path alongside its dynamic graph-native strengths.


🧩 What the PoC demonstrates

The PoC uses:

  • TF-IDF
  • KMeans with fixed seed
  • Deterministic community assignment
  • Deterministic exact search inside cluster
  • No sampling, no planner, no hop expansion

The goal is not to replace graph traversal — just to offer a strict, reproducible routing mode.


💡 Proposal

Add an optional configuration:

deterministic_mode = true

When enabled:
• 	Global routing uses fixed clustering or deterministic partition
• 	Local search uses exact/deterministic similarity
• 	LLM calls disable sampling (, )
• 	Planner and hop expansion are disabled
• 	Same input → same routing trace → same output
This enables:
• 	Reproducibility
• 	Research comparisons
• 	Compliance/audit pipelines
• 	Deterministic evaluation

🔧 Possible integration points
• 	Add a deterministic branch inside the retrieval pipeline
• 	Enable/disable via config or CLI flag
• 	Provide simple examples for both modes
• 	Allow users to benchmark deterministic vs dynamic behaviors

❓ Open questions
• 	Should deterministic routing still leverage communities from the graph?
• 	Should this mode restrict multi-hop traversal completely, or just fix its route?
• 	What parts of the graph are still meaningful under deterministic constraints?

I’d be happy to help with an example PR if this aligns with your roadmap.
Thanks for your work on GraphRAG — excited to see this grow!

### Describe the solution you'd like

_No response_

### Additional context

_No response_

yuer-dsl avatar Nov 20 '25 03:11 yuer-dsl