Add Deterministic Retrieval Mode (Stable Global→Local Routing, No Hops/Planner/Sampling)
Do you need to file an issue?
- [x] I have searched the existing issues and this feature is not already filed.
- [ ] My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
- [ ] I believe this is a legitimate feature request, not just a question. If this is a question, please use the Discussions area.
Is your feature request related to a problem? Please describe.
📌 Feature Request: Deterministic Retrieval Mode (Stable Global→Local Routing)
Hi team,
I’ve been experimenting with global→local retrieval patterns and built a very small PoC demonstrating a fully deterministic RAG pipeline — no planners, no hops, no sampling, and no hidden randomness.
🔗 Repo (minimal PoC)
https://github.com/yuer-dsl/deterministic-rag-poc
🧪 Example implementation
🔍 Why this matters
Many RAG systems (including GraphRAG) rely on:
- Multi-hop reasoning
- Planner-generated routes
- Sampling/temperature in intermediate steps
These introduce hidden randomness and make end-to-end reproducibility difficult.
For regulated, audit-sensitive, or high-reliability environments, we often need:
Same corpus + same query → same route → same output.
A deterministic mode can give GraphRAG a high-certainty retrieval path alongside its dynamic graph-native strengths.
🧩 What the PoC demonstrates
The PoC uses:
- TF-IDF
- KMeans with fixed seed
- Deterministic community assignment
- Deterministic exact search inside cluster
- No sampling, no planner, no hop expansion
The goal is not to replace graph traversal — just to offer a strict, reproducible routing mode.
💡 Proposal
Add an optional configuration:
deterministic_mode = true
When enabled:
• Global routing uses fixed clustering or deterministic partition
• Local search uses exact/deterministic similarity
• LLM calls disable sampling (, )
• Planner and hop expansion are disabled
• Same input → same routing trace → same output
This enables:
• Reproducibility
• Research comparisons
• Compliance/audit pipelines
• Deterministic evaluation
🔧 Possible integration points
• Add a deterministic branch inside the retrieval pipeline
• Enable/disable via config or CLI flag
• Provide simple examples for both modes
• Allow users to benchmark deterministic vs dynamic behaviors
❓ Open questions
• Should deterministic routing still leverage communities from the graph?
• Should this mode restrict multi-hop traversal completely, or just fix its route?
• What parts of the graph are still meaningful under deterministic constraints?
I’d be happy to help with an example PR if this aligns with your roadmap.
Thanks for your work on GraphRAG — excited to see this grow!
### Describe the solution you'd like
_No response_
### Additional context
_No response_