[FEATURE]: Support for a local Relevance Index
Feature hasn't been suggested before.
- [x] I have verified this feature I'm about to request hasn't been suggested before.
Describe the enhancement you want to request
Problem Summary (from AI): LLM architecture results in context window data loss because it is bandwidth-limited by the Softmax function. It forces the model to ignore the vast majority of the context to focus on a few spikes. When the context is too long, the "noise" of irrelevant tokens drowns out the "signal" of relevant ones, and the mechanism defaults to looking at the beginning (anchors) and end (immediate context) while losing the middle.
Best Possible Solution that I could think of: We need a data structure that enables AI to tell what entities are relevant to modifications of any specific entity at a glance, a.k.a. a single script command.
AI is able to figure this out fairly well for any individual entity, but it's currently restricted to storing that information in the context window, which is seriously lossy due to LLM's attention transformer architecture.
Thus, optimal LLM-assisted programming will require a new data system that supplements source code with relevance weights? for the entities that the source code represents. The relevance index wouldn't even have to be exhaustive in order to cause operations that involve a fuzzy search (one of the main things I want AI to be able to do) to be more comprehensive.
Relevance weights could also remind Agents that if they make a change to a particular entity (some configuration of a transpiler or linter, perhaps) that it's a Chesterton's Fence and will likely require changes to lines A & B in File X, and line C in File Z. Or even something as basic as "This comment above this function has particular relevance to these two lines of logic nested about 25 lines down",
Finally, the relevance index needs to be updated somehow upon test failure, because in my experience, test failures are mainly caused by things slipping through the context window's cracks (or by a failure to test changes, either manually or in CI).
This data structure would be stored locally and committed to version control as it could possibly grow to match the size of the code base itself.
Why OpenCode?: I'd love to see open-source get some bragging rights in the AI war. Also, I'm pretty sure that all the Corps keep their eyes on us.
A little marketing: This isn't the only crazy idea I've had lately. Check out https://github.com/Judahmeek/Significance-Hypothesis-Based-ARC-AGI-2-puzzle-solver for another one.
Searching for potential duplicates...
This issue might be a duplicate of existing issues. Please check:
- #2108: Is there a repository map or RAG feature? - discusses indexing and optimizing LLM understanding of codebase structure
- #1990: [Feature Request] Add User Controls for Context Management - addresses context window limitations and fine-grained control over what stays in context
- #3982: Context tokens resets to zero with every new action of AI agent - discusses context window being reset during multi-turn tasks
Your "relevance index" proposal could complement these existing discussions, particularly #2108's RAG/repository map concept and #1990's context management controls.
Feel free to ignore if your feature request addresses a specific gap these issues don't cover.
I'd note that https://github.com/sst/opencode/issues/2108 injects information about the codebase into the context window. A relevance index could be accessed through a tool call, almost entirely bypassing the context window, thus risking far less information being lost.
Have you considered making a custom-tool that can do this?
It'd be interesting to see how it performs, right now we are focused on some lower level things:
- overall stability (bug fixes)
- improvements to baseline tooling: better prompts, better grep (looking at things like ast-grep and mgrep)
- adding highly in demand features
Typically, people will also have similar idea, then comment on this issue here or on discord and then it will move it up higher in our priority list.
We have some very important things we need to address first, but I like your idea
One reviewer made the suggestion of indexing the abstract syntax tree for relevance, which may simplify things.
Slightly off-topic: In the context of agentic execution, the same reviewer suggested making every level of abstraction handled by a different sub-agent, but that suggestion runs into the problem of making levels of abstraction discrete enough to distinguish between.