ragflow [Feature Request]: How can I use an agent to improve my search results? (Deepresearch component)

Self Checks

[x] I have searched for existing issues search for existing issues, including closed ones.
[x] I confirm that I am using English to submit this report (Language Policy).
[x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
[x] Please do not modify this template :) and fill in all the required fields.

Describe your problem

I'd like to improve my retrieval speed and effectiveness using an agent-based workflow. I hope to enhance Ragflow's retrieval performance through related question generation, multi-path recall, and result summarization. However, this is my first time using an agent, so I'm testing how to use it. Currently, I'm encountering situations where I can't retrieve any information during the testing phase.

I'm using the knowledge base "Test 1" for my experiment, and I've ensured that my query originates from the knowledge base. But strangely, the retrieved content is empty ("没有检索到数据"). @dosu

Dec 10 '25 07:12 Monkeycrop

To improve your search results with an agent in RAGFlow, you can design workflows that use related question generation, multi-path recall, and summarization. Agents can orchestrate these steps by generating alternative queries (using the /sessions/related_questions API or a Transformer component), retrieving results for each, and summarizing the findings with an LLM node. This approach expands the search space and increases recall, especially for ambiguous or broad queries. You can see more about agent orchestration and prompt configuration in the Agent component docs.

If you're getting empty retrieval results, check these areas:

Make sure your knowledge base is properly indexed and contains relevant data for your queries. You can manually add keywords or related questions to chunks to improve ranking and recall (dataset config guide).
Tune retrieval parameters: lower the similarity threshold (default is 0.2), adjust vector similarity weight (default is 0.3), and increase Top N results. Overly strict settings can filter out all results (retrieval reference).
If using a rerank model, be aware it can slow down retrieval and may require GPU acceleration. Leaving it empty speeds up retrieval (best practices).
Metadata filters or knowledge graph options can restrict results—review these settings if enabled.
Recent fixes improved metadata condition handling in retrieval (PR #11338), so make sure your deployment includes these updates.

For agent workflow design, consider chaining a Transformer to extract related questions, then use retrieval for each, and finally summarize with an LLM. You can reference the new user interaction agent template for structure (PR #11185).

If you adjust retrieval parameters during testing, remember to apply them to your workflow component settings—they aren't saved automatically.

Let me know if you want a sample workflow diagram or more details on any step!

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Dec 10 '25 07:12 dosubot[bot]

I've checked all of these and there are no problems, but I'm still not getting any results. Could you provide a successful agent configuration example? @dosu

Dec 10 '25 07:12 Monkeycrop

Here’s a concrete, working agent configuration example for RAGFlow that demonstrates effective retrieval and multi-step orchestration. This setup is based on the official “Report Agent Using Knowledge Base” and “Deep Research” templates, which are proven to work for retrieval, related question generation, and summarization workflows.

Minimal Agent Workflow Example

A typical agent workflow connects nodes in series: Begin → Agent → Message. The Agent node orchestrates retrieval and reasoning. Here’s a simplified diagram:

graph LR
    A["Begin"] --> B["Agent"]
    B --> C["Message"]

Key parameters for the Retrieval tool inside the Agent:

similarity_threshold: 0.2
keywords_similarity_weight: 0.7
top_n: 8
top_k: 1024
rerank_id: (leave empty for speed)
use_kg: false (unless you need knowledge graph)
kb_ids: [your knowledge base ID, e.g., "Test 1"]

Agent prompt (system prompt):

You are a Knowledge Base Retrieval Q&A Agent. Break down the user’s question into retrievable subtasks, retrieve from the knowledge base, and generate a structured, evidence-backed answer. Do not fabricate information. If nothing is found, say so.

User prompt:

# User Query
{sys.query}

Sample JSON snippet for the Agent node:

{
  "component_name": "Agent",
  "params": {
    "llm_id": "your-llm-id",
    "max_tokens": 4096,
    "prompts": [
      {
        "content": "# User Query\n{sys.query}",
        "role": "user"
      }
    ],
    "sys_prompt": "You are a Knowledge Base Retrieval Q&A Agent. ...",
    "tools": [
      {
        "component_name": "Retrieval",
        "params": {
          "kb_ids": ["your-kb-id"],
          "similarity_threshold": 0.2,
          "keywords_similarity_weight": 0.7,
          "top_n": 8,
          "top_k": 1024,
          "rerank_id": "",
          "use_kg": false
        }
      }
    ]
  }
}

See the full template for more advanced orchestration:
Report Agent Using Knowledge Base (JSON)

Advanced: Multi-Agent Orchestration

For multi-path recall and summarization, use the “Deep Research” template, which chains subagents for web search, content extraction, and synthesis.
Deep Research Template (JSON)

Troubleshooting Tips

Make sure all nodes are connected in series; isolated nodes won’t execute.
Use the latest RAGFlow version (not “nightly-slim”) to avoid retrieval bugs.
Set metadata filtering to “disabled” for testing.
If you still get empty results, try prepending USER: to your input query before retrieval.
For Chinese queries, ensure your Elasticsearch analyzer supports Chinese segmentation if using full-text search.

Let me know if you want a full JSON export or help adapting this to your specific workflow!

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Dec 10 '25 07:12 dosubot[bot]

Why can't I still retrieve the data? The question I'm asking must exist in the knowledge base of the Retrieval tool. @dosu

Dec 10 '25 09:12 Monkeycrop

This is a common issue in RAGFlow agent workflows, and there are several likely causes—even if your question exists in the knowledge base:

Version-related bug: In some RAGFlow versions (notably before the latest nightly), retrieval works in knowledge base testing but fails in the Agent workflow, regardless of parameter settings. Upgrading to the latest nightly (not slim) Docker image usually resolves this (source, source).
Metadata filtering: If any metadata filters are set (even unintentionally), retrieval may return empty results. Disable all metadata filters for testing (source, source).
Embedding model mismatch: The embedding model for your knowledge base must match the one used in the Retrieval tool and agent. If they differ, retrieval silently fails (source, source).
Chunk availability: All documents must be fully parsed and indexed, and chunks must be marked as available. If you recently upgraded or reconfigured, try re-parsing your documents (source).
Retrieval parameters: Overly strict similarity thresholds or low Top N values can filter out all results. Try lowering the threshold (e.g., 0.1) and increasing Top N (e.g., 20) (source).
Knowledge base selection: Ensure the correct knowledge base is explicitly selected in the Retrieval tool's configuration. If no knowledge base is selected, retrieval will fail (source).
Chinese queries with Elasticsearch: If you use Elasticsearch and your queries are in Chinese, make sure your ES index uses a Chinese analyzer (like IK or smartcn). The default whitespace analyzer will not match Chinese text (source).
Agent input formatting: In some versions, the Retrieval tool in the agent only works if you prepend a constant like USER: to the query. Try this as a workaround (source).

What to do next:

Upgrade to the latest nightly (not slim) Docker image and re-parse your knowledge base.
Double-check that your knowledge base, agent, and retrieval tool all use the same embedding model.
Disable all metadata filters and knowledge graph options for testing.
Lower the similarity threshold and increase Top N.
Explicitly select your knowledge base in the Retrieval tool.
If you still get empty results, try prepending USER: to your query.

If the problem persists, please provide a screenshot of your knowledge base chunk list for the query term, your retrieval parameters, and your agent configuration. This will help pinpoint the issue.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Dec 10 '25 09:12 dosubot[bot]

How can I use the output of the previous step as the input for the current operation? @dosu

Dec 10 '25 10:12 Monkeycrop

To pass the output of one step as the input for the next in RAGFlow’s workflow editor, you need to reference the previous node’s output variable in the prompt or input field of the current node. Connections between nodes only define execution order—you must explicitly set which output variable to use as input for each component source.

In the UI, when configuring a node (like “知识库检索”), look for the prompt or input box. Type / or click the (x) button to open the variable picker, which lists all available outputs from upstream nodes. Select the output you want (for example, the result from “相关问题生成”), and it will insert a variable like {相关问题生成@output} into your prompt or query field source.

If you want to format or combine outputs, use the “Template” component to process data before passing it to the next node source.

Example:
Suppose your workflow is:
开始 → 相关问题生成 → 知识库检索

In “知识库检索”, set the query field to {相关问题生成@output} (or select it from the variable picker).
If you want to chain multiple outputs, use a Template node to combine them, then reference the template’s output in the next step.

Diagram:

graph LR
    A["开始"] --> B["相关问题生成"]
    B --> C["知识库检索"]
    C --> D["后续操作"]

If you see errors about missing input, double-check that you’ve mapped the correct output variable from the previous node. Recent updates also improved variable aggregation and loop support for more advanced workflows source.

Let me know if you want a concrete JSON or UI screenshot example!

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Dec 10 '25 10:12 dosubot[bot]

You could generate some sub-queries firstly. Then, iterating all the queries to retrieve. Finally, summarize all the retrievals.

Dec 11 '25 03:12 KevinHuSh

Do you have specific cases?

Dec 11 '25 04:12 KevinHuSh

I'm not sure what specific case you're referring to, but I'd like to improve my search performance using an agent workflow. It should achieve the following:

Parse the user's query input
Generate 3-4 related questions based on the user's query input
Use each related question to start a subtask for fragment retrieval
Summarize the fragment information obtained from each subtask and output the final answer. @KevinHuSh

Dec 11 '25 06:12 Monkeycrop