continue icon indicating copy to clipboard operation
continue copied to clipboard

Feature Request: LLM-Driven Query Rewriting for Enhanced Codebase Retrieval

Open nicejade opened this issue 6 months ago • 0 comments

Validations

  • [x] I believe this is a way to improve. I'll try to join the Continue Discord for questions
  • [x] I'm not able to find an open issue that requests the same enhancement

Problem

Currently, Continue’s code retrieval uses keyword/semantic matching (e.g. via @codebase) to fetch relevant snippets. We propose adding an LLM-based intent analysis step before the actual retrieval. In this step, the system would prompt a language model to interpret the user’s original query (or code comment) and then rewrite or expand it into a more explicit, keyword-rich query. After retrieving results, the system would use the model again to check if the returned code aligns with the inferred intent; if not, it would iterate – updating the intent understanding and refining the query until the results match the user’s needs. Optionally, for transparency and debugging, the assistant could display the rewritten queries and intermediate search results to the user. This gives insight into how the system interpreted the request and allows the user to correct or refine it.

Proposal

  • Intent Recognition (LLM analysis): Before calling the code search engine, prompt a large language model to read the user’s request and output the underlying intent. For example, the LLM could be prompted with something like “What does the user want? Rewrite this request as a concise intent statement.” This might yield a clarified description of the goal.
  • Query Rewriting (LLM rewrite): Using the inferred intent, prompt the LLM to rephrase or expand the original query into a better search query. For instance, it could add relevant synonyms, context, or break apart a complex question. This is akin to the “Rewrite-Retrieve-Read” approach described in recent research​ openreview.netaclanthology.org, where an LLM first reformulates the query before retrieval.
  • Retrieve: Use the rewritten query to perform the actual codebase search (e.g. via the existing @codebase provider).
  • Alignment Check: Once results are returned, use the LLM (or a simpler heuristic) to compare the content of those results against the original intent. For example, ask the model: “Do these results actually answer the user’s question/intent?”. If the top results are not on target, we loop back.
  • Iterate if Needed: If the retrieved code is off-target, feed that feedback into a second pass of intent/query rewriting. The LLM can refine its understanding of the intent or try a different phrasing. This iterative loop continues until the results satisfy the user’s need (or a max number of tries is reached). This idea mirrors the SELF-RAG method for conversational QA, where the model “rewrites the conversation for retrieval and judges the relevance of returned passages” in an iterative manner​ arxiv.org.

Throughout this process, the prompts to the LLM should be carefully crafted. For example, a prompt might say: “The user asked: ‘ [original prompt]’. Identify the user’s goal and rewrite this as a clear search query for code snippets.” This encourages the LLM to focus on intent and useful keywords. We could also use few-shot examples or fine-tune a smaller “rewriter” model for efficiency, as seen in recent work on trainable query rewriters​ openreview.netaclanthology.org.

Benefits

  • More accurate retrieval: By rephrasing queries into the model’s “native” language, we overcome issues of phrasing and synonyms. RAG (retrieve-then-read) systems are known to be sensitive to exact wording​ medium.com, so an LLM rewrite can restructure oddly phrased requests, remove irrelevant parts, and insert common technical keywords. For example, it might turn a vague query into a precise search with domain-specific terms. This semantic enhancement typically boosts retrieval performancemedium.comrestack.io.
  • Intent alignment: LLMs excel at grasping context and subtleties in language​ thedatafreak.medium.com. Even if a user’s question is incomplete or ambiguous, the LLM can infer the likely intent. This helps avoid the common RAG pitfall where semantic match brings irrelevant results (e.g. matching on shared words but not on actual meaning)​ thedatafreak.medium.com. By ensuring the query truly reflects what the user wants, the retrieved code snippets will better satisfy the intent.
  • Improved user satisfaction: Users will get relevant answers more often on the first try. Instead of manually rewording their prompt or poking at the assistant multiple times, the system proactively refines the query. In practice, this leads to a “more effective and user-friendly retrieval experience” and higher satisfactionrestack.io. Users can accomplish tasks with fewer retries or clarifications.
  • Fewer manual refinements: Ambiguous or complex queries often require several manual tweaks. Automating the rewrite reduces these cases. For example, if a user asks a broad question, the LLM can break it into sub-questions or focus on the most relevant aspect. This reduces friction and makes the assistant seem smarter.
  • Robustness to vague queries: When the user’s query is incomplete (missing a keyword or context), the LLM can fill in the gaps. This robustness is crucial in real codebases where a user might remember only part of a function or concept. By inferring the missing pieces, the system avoids dead-ends and “I don’t know” answers. Studies show that even simple LLM-generated query rewrites can significantly improve retrieval hits​ arxiv.orgarxiv.org.
  • Transparency and debugging: (Optional) Exposing the rewritten queries and interim results can build trust. The research on interactive query understanding suggests showing the “machine intent” (the LLM’s interpretation) to the user allows for early feedback​ ar5iv.labs.arxiv.orgar5iv.labs.arxiv.org. For instance, if the LLM infers the wrong intent, the user can see the misinterpretation and correct it. This transparency is valuable both for end-users (who see how the system thinks) and for developers debugging the retrieval logic.

Optional Enhancements

  • Show rewritten queries: Before executing the search, display the rewritten query (or multiple candidate rewrites) to the user. For example: “Interpreted intent: find code that implements X feature. Searching for: ‘function X implementation search term’.” This gives the user immediate insight. Prior work proposes showing these natural-language “machine intents” so users can “commit” to a query or refine it​ ar5iv.labs.arxiv.org.
  • Show intermediate results: After each retrieval attempt, briefly show top results (or a summary) to let the user see if the system is on track. If the user notices a mismatch, they could intervene. This could be a debug log or even a UI dialog for advanced users.
  • Interactive correction: If the system is unsure (low confidence), prompt the user to choose from several rewritten queries or to provide feedback. For instance, “I interpreted your question as __; does that match what you want?” This mirrors the “commit vs refine” strategy in the query understanding framework​ ar5iv.labs.arxiv.orgar5iv.labs.arxiv.org.
  • Model tuning: Over time, collect feedback on the rewrites to fine-tune the rewriter prompts or model. If certain prompts consistently yield poor rewrites, they can be improved. This is similar to techniques of trainable query rewriters that use reinforcement learning on retrieval feedback​ openreview.net.

Each of these enhancements aims to make the retrieval pipeline smarter and more user-friendly. By leveraging an LLM for intent analysis and by being transparent about its reasoning, Continue can deliver code snippets that more precisely match the developer’s needs with fewer extra steps.

References: This idea builds on recent research in retrieval-augmented generation that shows query rewriting improves performance​ medium.comopenreview.net. In particular, Rewrite-Retrieve-Read pipelines have been shown to yield consistent gains over vanilla retrieval​ openreview.netaclanthology.org, and LLMs can even judge the relevance of results in a loop​ arxiv.orgarxiv.org. Query rewriting also directly addresses semantic-ambiguity issues by capturing user intent​ thedatafreak.medium.comthedatafreak.medium.com. Finally, interactive query frameworks highlight the value of exposing the system’s interpretation to users​ ar5iv.labs.arxiv.orgar5iv.labs.arxiv.org. All these studies support that an LLM-driven rewrite step should make Continue’s code search more accurate, robust, and satisfying to use.

Solution

No response

nicejade avatar Apr 27 '25 08:04 nicejade