[BUG] SearchTool Provides Irrelevant Results in Production (HK) vs. Development (CN)
Description
A critical issue has been identified where the SearchTool (which appears to use Bing search under the hood) behaves inconsistently between our local development environment (Hangzhou, Mainland China) and our production deployment (Hong Kong).
When a query is made, the development environment retrieves highly relevant search results, leading to correct RAG outputs. However, the exact same query in the production environment retrieves completely irrelevant, junk-like results, which severely degrades the quality and accuracy of the final generated answer. This makes the RAG pipeline unreliable in production.
This is a high-priority bug as it fundamentally breaks the retrieval mechanism of ApeRAG in certain common deployment regions.
Environment Discrepancy
| Environment | Location | Observed Behavior | Result Quality |
|---|---|---|---|
| Development | Hangzhou, Mainland China | The underlying search request to bing.com is redirected to cn.bing.com, which serves correct, localized results. |
Excellent |
| Production | Hong Kong | The search request hits bing.com's global endpoint directly, which returns completely irrelevant results (e.g., Japanese financial data for a weather query). |
Critical Failure |
Steps to Reproduce
The underlying network behavior can be replicated without running the full ApeRAG stack, using cURL to simulate the HTTP requests from the two locations.
-
Simulate Production (from a Hong Kong server):
# Query for "拉斯维加斯天气" (Las Vegas weather) curl -vL "https://www.bing.com/search?q=%E6%8B%89%E6%96%AF%E7%BB%B4%E5%8A%A0%E6%96%AF%E5%A4%A9%E6%B0%94"Result: The HTML returned is for unrelated Japanese financial products.
-
Simulate Development (from a Mainland China server):
# Same query curl -vL "https://www.bing.com/search?q=%E6%8B%89%E6%96%AF%E7%BB%B4%E5%8A%A0%E6%96%AF%E5%A4%A9%E6%B0%94"Result: The request is redirected to
cn.bing.com, and the HTML contains relevant results from sites likezhihu.com.
Root Cause Analysis
The issue stems from how Bing's servers treat programmatic, non-browser requests from different geographic locations:
- Geographic Routing: Bing correctly routes traffic to different edge nodes based on IP. The Hangzhou IP is routed to a Mainland China-specific infrastructure, while the Hong Kong IP is routed to a global/HK node.
- Client-Side Identity: The HTTP client used by ApeRAG's
SearchTool(andcURL) is likely being identified as a "bot" or non-standard client by Bing's global endpoint in Hong Kong. This seems to trigger a fallback or anti-scraping mechanism that serves junk data. - Redirection Difference: The Mainland China infrastructure is configured to redirect all traffic to
cn.bing.com, a service optimized for all types of clients. The global infrastructure does not have this behavior, exposing the different treatment of non-browser user agents.
Impact on the ApeRAG Project
- Unreliable Production Deployments: Any ApeRAG application deployed in Hong Kong (or potentially other regions outside Mainland China) will have a non-functional search/retrieval step.
- "It Works On My Machine" Problem: This creates a severe discrepancy between development and production, making it difficult to debug and trust local testing.
- Poor RAG Quality: The core promise of RAG is to provide accurate, context-aware answers. With a faulty retrieval step, the generator produces nonsensical or incorrect information ("garbage in, garbage out").
Proposed Solution / Next Steps
The most direct solution is to make the HTTP requests from ApeRAG's SearchTool appear as if they are coming from a standard web browser.
Recommendation:
Modify the HTTP client within the SearchTool to include a standard set of browser headers. At a minimum, this should include:
User-Agent: e.g.,Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36Accept-Language: e.g.,en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
This change should make Bing's global servers treat the request as a legitimate user interaction, returning relevant results and resolving the environment inconsistency.
A longer-term solution might involve integrating with official, paid search APIs (like the Bing Search API), which are designed for programmatic access and guarantee consistent results.
Supporting Logs
<details>
<summary><b>Full cURL Log from Hong Kong (Production Simulation)</b></summary>
hk.txt log content here...
< HTTP/2 200 ... < x-msedge-ref: Ref A: 1B95296EBD2143C0BFF4AA1CAA2697DB Ref B: HKBEDGE0908 Ref C: 2025-09-19T02:05:53Z ...
</details>
<details>
<summary><b>Full cURL Log from Hangzhou (Development Simulation)</b></summary>
hz.txt log content here...
< HTTP/2 302 < location: https://cn.bing.com/search?q=%E6%8B%89%E6%96%AF%E7%BB%B4%E5%8A%A0%E6%96%AF%E5%A4%A9%E6%B0%94 ... < HTTP/2 200 ... < x-msedge-ref: Ref A: 766A2412CFA14F638355C0212634C9D0 Ref B: BJ1EDGE0719 Ref C: 2025-09-19T02:05:32Z ...
</details>
Root Cause Identified: DuckDuckGo Search Provider Redirect Handling Issue
I've identified the actual root cause, which is different from the initial Bing anti-bot hypothesis.
The Real Issue
- Production Environment: Hong Kong deployment lacks JINA API key configuration
- Fallback Mechanism: ApeRAG falls back to DuckDuckGo search provider when JINA is unavailable
- Geographic Redirect Problem: DuckDuckGo internally forwards to Bing, but handles redirects differently:
- Mainland China:
bing.com→cn.bing.com(302 redirect) ✅ Works - Hong Kong:
bing.comserves directly (no redirect) ❌ Fails
- Mainland China:
- Library Limitation: The
duckduckgo-searchlibrary doesn't properly handle these geographic redirect differences
Why Development Works vs Production Fails
| Environment | JINA API Key | Search Provider | Result |
|---|---|---|---|
| Development (Hangzhou) | ✅ Configured | JINA (primary) | ✅ Success |
| Production (Hong Kong) | ❌ Missing | DuckDuckGo (fallback) | ❌ Failure |
Immediate Solution
Configure JINA API keys in your Hong Kong production environment:
# Production config
providers:
jina:
api_key: "your-jina-api-key"
This will bypass the problematic DuckDuckGo fallback entirely and use JINA's robust search infrastructure.
Alternative Solutions
- Enhanced DuckDuckGo Provider: Improve redirect handling for geographic differences
- Direct Bing Search API: Implement official Bing Search API integration
- Smart Fallback Logic: Add region-aware search provider selection
Verification
After configuring JINA API keys, test the same queries that previously failed. The search should work consistently across all geographic regions.
This issue has been marked as stale because it has been open for 30 days with no activity