ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Question]: How does the model search for content based on a given website?

Open luboyan6 opened this issue 1 month ago • 2 comments

Self Checks

  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report (Language Policy).
  • [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
  • [x] Please do not modify this template :) and fill in all the required fields.

Describe your problem

When a user's question is not covered in the knowledge base, how can I ensure the model searches the website specified in the system prompt to retrieve and incorporate relevant information into its response?

I’ve included the intended search website in the system prompt, but the model didn’t use it—instead, it searched other websites. How can I resolve this issue?

Image

luboyan6 avatar Dec 09 '25 08:12 luboyan6

RAGFlow doesn't restrict retrieval to a specific website just because you mention it in the system prompt. Instead, retrieval is controlled by which knowledge base (KB) you select and any metadata filters you apply. The system prompt only tells the LLM how to use retrieved content—it doesn't decide what gets retrieved or from where [source].

To ensure the model only searches a specific website, you have two main options:

  1. Scope your knowledge base: Create a KB that contains only documents from the target website. When you select this KB in the UI, retrieval will be limited to its content [source].
  2. Apply a metadata filter: If your KB contains documents from multiple sites, use a metadata filter (e.g., filter by the domain field) to restrict retrieval to the desired website. The Firecrawl processor and similar tools extract metadata like domain and url for each document, which you can use for this purpose [source].

If you don't set up your KB or filters this way, retrieval may return content from any site included in your KBs. The system prompt alone can't enforce this restriction [source].

Common misconfigurations include: not selecting the correct KB, missing or incorrect metadata filters, embedding model mismatches, or incomplete document parsing/indexing [source].

If you want to disable web search entirely and only use your KB, set "disable": true in the websearch component of your assistant's configuration (e.g., websearch_assistant.json) [source].

Let me know if you need an example of how to set up a metadata filter for domain restriction.

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other

Chat with Dosu Open in Cursor Join Discord Share on X

dosubot[bot] avatar Dec 09 '25 08:12 dosubot[bot]

I suggest you try Agent, where you can create an agent to do “search website" only.

Magicbook1108 avatar Dec 09 '25 11:12 Magicbook1108