dify retrieval testing of large knowledgebase gives milvus exception

Self Checks

[x] This is only for bug report, if you would like to ask a question, please head to Discussions.
[x] I have searched for existing issues search for existing issues, including closed ones.
[x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[x] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:)
[x] Please do not modify this template :) and fill in all the required fields.

Dify version

1.4.1

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

I have created a knowledgebase with about 500 documents with 150 pages each. I´m using milvus as vector db. I have the problem within chatbot that it seems the chatbot has no access to the information. deeper research showed that knowledge retrieval isn´t working. I get MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=...])

within the knowledge retrieval interface.

I don´t know what i should do now.

✔️ Expected Behavior

knowledge retrieval should work

❌ Actual Behavior

knowledge retrival not working

May 27 '25 18:05 mamema

The error you are seeing—MilvusException: (code=106, message=failed to search: loaded collection do not found any channel in target, may be in recovery: collection on recovering[collection=...])—indicates that Milvus is unable to search the collection because it is in a recovery state or its data channels are not available. This is typically a Milvus-side issue, not directly caused by Dify, but it can be triggered by configuration, resource, or version mismatches.

Here are some steps and checks based on Dify’s integration with Milvus and common Milvus troubleshooting:

Check Milvus Collection State
The error suggests the collection is still recovering or not fully loaded. You can check the collection state using Milvus’ management tools or API. Ensure the collection is fully loaded and healthy before running retrieval queries.
Milvus Version Compatibility
Dify’s Milvus integration checks for hybrid search support and expects Milvus version >= 2.5.0 if hybrid search is enabled. If your Milvus version is older, or if hybrid search is enabled without support, you may encounter issues. You can disable hybrid search by setting MILVUS_ENABLE_HYBRID_SEARCH=False in your .env file or Docker Compose configuration, or upgrade Milvus to at least 2.5.0 if you need this feature.
See MilvusVector implementation
Configuration Review
Double-check your Milvus connection settings in .env or docker-compose.yaml:
```
MILVUS_URI=http://milvus-standalone:19530
MILVUS_TOKEN=
MILVUS_USER=root
MILVUS_PASSWORD=Milvus
MILVUS_ENABLE_HYBRID_SEARCH=False
```
Reference: Milvus configuration in Dify docs
Docker Networking
If you are running Milvus and Dify in Docker, ensure both are on the same Docker network and the URI is reachable from the Dify containers.
See Docker Compose networking advice
Restart and Monitor Milvus
Sometimes, Milvus collections can get stuck in recovery after a crash or resource exhaustion. Restarting the Milvus service and monitoring its logs can help. Make sure your server has enough memory and CPU for the size of your knowledgebase.
No Known Dify-Specific Bugs
There are currently no open or closed issues in the Dify repository that match this exact Milvus error code or scenario.

If after these checks the problem persists, it may be necessary to investigate Milvus logs directly or consult Milvus documentation/support for collection recovery issues. If you have recently migrated or bulk-inserted a large amount of data, it may take some time for Milvus to finish recovery, especially with large collections.

Let me know if you need more detailed steps for any of these checks.

To reply, just mention my name @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

May 27 '25 19:05 dosubot[bot]

your suggestions regarding config are already in place. I had to limit the memory consumption of the milvus-standalone container to 80g as if i wasn´t doing this, then the server with 192 GB ram got under pressure and the whole docker environment restarted. I would be happy to check further for a solution, but if 80g RAM isn´t enough for milvus i do not know what to do..... I´m using docker image: milvusdb/milvus:v2.5.9-gpu @dosu

May 27 '25 19:05 mamema

i´ve found this web tool, https://milvus.io/tools/sizing but just don´t know what to fill in. As described i have about 500 documents with 150 pages each (pdf). The whole knowledgebases are about 32700 documents @dosu i cannot add RAM infinite without upper limitations..... perhaps milvus is the wrong vector db then.

May 27 '25 19:05 mamema

To use the Milvus Sizing Tool (https://milvus.io/tools/sizing), you’ll need to estimate two main parameters:

Number of Vectors:
This is the total number of embeddings you’ll store. In your case, you mentioned about 32,700 documents. If each document is split into multiple chunks (e.g., per paragraph or per page), the actual number of vectors could be much higher. If you know how many chunks or embeddings are generated per document, multiply that by your document count for a more accurate estimate.
Vector Dimension:
This depends on the embedding model you use. Common OpenAI and similar models produce vectors of size 1536, 768, or 384. If you’re unsure, check your embedding model’s documentation or your Dify configuration for the dimension.

Example for the Sizing Tool:

Number of vectors: 32,700 (or higher if chunked)
Vector dimension: 1536 (if using OpenAI’s text-embedding-ada-002, for example)
Choose “Standalone” for your current setup

RAM Limitation & Alternatives:

Milvus is known to require significant RAM, especially for large collections and high-dimensional vectors. If 80GB RAM is not enough and you cannot increase it, you may run into persistent issues.
Dify supports several other vector databases, including Qdrant (recommended), Weaviate, Pgvector, Chroma, and others, which may have lower resource requirements or be easier to tune for smaller hardware environments. You can see the full list of supported databases here.

Tips:

If you want to continue with Milvus, try reducing the number of vectors (e.g., by chunking less aggressively or filtering out less relevant data).
For a more RAM-efficient solution, consider switching to Qdrant or Weaviate, both of which are supported by Dify and may be easier to run within your hardware limits.

If you can share your embedding model or chunking strategy, I can help you estimate the vector count and dimension more precisely for the sizing tool.

To reply, just mention my name @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

May 27 '25 19:05 dosubot[bot]

i´m using Parent/child chunking with 900/450 and the multilingual-large-e5-instruct embedding model. But @dosu what i do not understand from your explanations is, that 32000 documents are used / needed with the calculation, because if i load a knowledgebase with 400 documents max 5 pages each, then retrieval works. as soon as i load the large knowledgebase with 500 docs and 150 pages each, then the error occurs. i never use all knowledgebases with 32.000 docs at once....

some other source told me to configure this in milvus.yaml queryNode: memUsageLimitRatio: 0.35

cacheConfig: cpuCacheCapacity: 8GB insertBufferSize: 4GB preloadCollection: false

haven´t done this yet.

frankly i don´t care if i should use milvus or qdrant. it just should work. the only problem is that the whole knowledgbase stacks with 32.000 documents took weeks to build. this doing again with qdrant and then perhaps have the same issues isn´t promising

May 27 '25 20:05 mamema

Here’s a breakdown of what’s happening and some concrete next steps:

Why document count matters—even if you don’t use all at once

Milvus collections are global: when you load a large knowledgebase (e.g., 500 docs × 150 pages), Milvus must load the entire collection (all vectors for that knowledgebase) into memory for search. Even if you’re not querying all 32,000 documents at once, if your “large” knowledgebase is a single collection, Milvus tries to load all its vectors into RAM for retrieval. This is why smaller knowledgebases work, but larger ones hit memory/resource limits and trigger recovery errors.

Your chunking strategy (Parent/child 900/450) and the multilingual-large-e5-instruct model (dimension 1024) mean each page or chunk creates a high-dimensional vector. Multiply this by the number of chunks per document and the number of documents, and the memory requirements grow rapidly.

About the Milvus config changes

The settings you mentioned:

queryNode:
  memUsageLimitRatio: 0.35

cacheConfig:
  cpuCacheCapacity: 8GB
  insertBufferSize: 4GB
  preloadCollection: false

memUsageLimitRatio limits how much of the container’s memory Milvus will use for query nodes. This can help prevent OOM (out-of-memory) crashes, but if set too low, queries may be slow or fail if there isn’t enough RAM to load the collection.
cpuCacheCapacity and insertBufferSize control how much RAM is used for caching and inserts. Lowering these can help fit within your 80GB limit, but may slow down performance.
preloadCollection: false means Milvus won’t try to load the whole collection into memory at startup, but will load as needed.

These settings can help avoid Docker restarts, but if your collection is simply too large for available RAM, retrieval will still fail or be slow. You’ll need to experiment to find the right balance.

Using the Milvus Sizing Tool

For the sizing tool:

Number of vectors: Estimate the number of chunks (child chunks) per document × number of documents in your largest knowledgebase. For example, if each 150-page document is chunked into 100 child chunks, and you have 500 documents, that’s 50,000 vectors.
Vector dimension: For multilingual-large-e5-instruct, the dimension is 1024.
Standalone/Distributed: Standalone for your current setup.

Plug these numbers into the tool to get a realistic RAM estimate. If the estimate is above your 80GB limit, you’ll likely keep hitting issues.

Should you switch to Qdrant (or another DB)?

Qdrant and Weaviate are generally more RAM-efficient for similar workloads and are supported by Dify. However, you’re right: migrating means re-indexing, which is time-consuming. If you want to test Qdrant without a full rebuild, consider:

Exporting a subset of your data (e.g., a few large and small knowledgebases) and indexing them in Qdrant to compare performance and resource usage.
If Qdrant works better, you can plan a staged migration.

Testing plan

Try the Milvus config changes you listed and monitor memory usage and retrieval performance.
Use the sizing tool with your actual chunk and vector counts to see if your hardware is sufficient.
Test Qdrant or Weaviate with a subset of your data to compare.
If you stay with Milvus, consider splitting very large knowledgebases into smaller collections if possible, or reducing chunking granularity.

Summary

The issue is due to the total number of vectors and their dimensions in your largest collection, not the total number of knowledgebases.
The config changes may help, but if your collection is too large for available RAM, retrieval will still fail.
Qdrant is a good alternative, but test with a subset first to avoid a full rebuild unless you’re confident.
Use the Milvus Sizing Tool with your actual chunk count and vector dimension to estimate RAM needs.

Let me know if you want help estimating your actual vector count or have questions about testing Qdrant or Weaviate.

To reply, just mention my name @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

May 27 '25 20:05 dosubot[bot]

@dosu the sizing interface has 1 Million vectors selectable as lowest value. do i set vector dimension to 1024, i have a amout of 18 GB RAM which is way below the 80 GB i provide. This doesn´t seems right, as the values on my server are: 894742 root 20 0 122.3g 71.0g 290780 S 9.1 37.6 341:22.45 milvus

and with free -g overall used free gemns. Puffer/Cache available RAM: 188 115 2 0 69 72 Swap: 15 15 0

May 28 '25 05:05 mamema

The discrepancy you’re seeing between the Milvus Sizing Tool’s estimate (18 GB RAM for 1M vectors, dim 1024) and your actual server usage (Milvus using 71 GB RAM, with swap fully used) is a known limitation of sizing calculators—they provide only a baseline for the vector data itself, not the full operational footprint.

The Sizing Tool typically estimates the memory needed to store the raw vectors and basic index structures, but real-world Milvus deployments often use much more RAM due to several factors:

Milvus loads entire collections (including all vectors, indexes, and metadata) into memory for search. If your collection is large or your chunking strategy produces many vectors, memory usage grows rapidly.
Index types (like IVF, HNSW, or others) can require significant extra RAM, especially for high-dimensional data.
Milvus also uses memory for caching, background processes, query buffers, and internal data structures, which the sizing tool does not account for.
Docker overhead and memory fragmentation can further increase actual usage.
If swap is fully used, it means the system is under memory pressure, which can degrade performance and cause instability.

The Sizing Tool’s minimum of 1M vectors may not match your actual collection size, but even so, its estimate is for the vector data only—not the total runtime environment.

To address this, you can try the Milvus config changes you mentioned (limiting cache, buffer, and preload settings), reduce chunking granularity, or split very large knowledgebases into smaller collections. If Milvus continues to exceed your memory limits, Qdrant or Weaviate may be more RAM-efficient alternatives, but migration would require re-indexing your data.

There is no Dify-specific documentation that explains this discrepancy, but these behaviors are consistent with how Milvus and similar vector databases manage memory in production environments.

To reply, just mention my name @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

May 28 '25 05:05 dosubot[bot]

@dosu

i´ve deleted those large problematic knowledgebases. currently ingesting again with milvus. i have 116918 root 20 0 152.9g 40.5g 134604 S 13.0 21.5 13,32 milvus

and NO problem with server/docker restart. already 293 docs with 150 pages each ingested. retrieval works fine.

So! The main difference now is that i don´t ingest the whole doc folder, i make fewer batches of about 20 docs per run.

What i can say is, that with the former runs i had ingested the whole folder and had issues during ingestion, either some dify docker container or lmstudio stopped working, which lead to a lot of documents "in queue" but they never restarted after the downtime issues were fixed. so i cleaned the "in queue" documents from the database and disk and ingested those again.

But i guess with these downtimes stopping containers and such, the vector database got corrupted which then leads to somehow memory issues. keeping fingers crossed but i guess my reported error here is not a memory issue, it is a database corruption issue

would be great if dify would provide a) some restart ingesting task, so the "in queue" files would be processed b) provide hints how to fix a corrupt vector db, cause starting from scratch isn´t fun at all.

May 28 '25 12:05 mamema

You’re correct that ingestion interruptions and container downtimes can leave documents stuck “in queue” and potentially cause vector DB inconsistencies. Here’s what you can do in Dify and with Milvus to address these issues:

1. Restarting Ingestion for “In Queue” Documents

Dify has built-in mechanisms to recover or retry document indexing tasks that get stuck due to failures or downtime. If you encounter documents stuck in the “in queue” or “indexing” state, you can:

Upgrade to the latest version of Dify, as several fixes for queue handling and scheduler reliability have been released.
Restart the dify-worker service. This often helps clear stuck tasks and resumes processing for documents in the queue.
Reference
Dify’s backend includes asynchronous tasks like recover_document_indexing_task and retry_document_indexing_task that can be triggered to recover or retry indexing for specific documents. These are typically invoked automatically, but can also be triggered via API or admin scripts if you have access.
Reference
If you see persistent “Queuing” status, it may be due to a Redis disconnection. Restarting the Worker node is recommended in this case.
Reference

2. Preventing and Handling Vector DB Corruption (Milvus)

There isn’t a Dify-specific tool for repairing a corrupt Milvus collection. If you suspect corruption, the safest approach is to remove the problematic collection and re-ingest the data, as you did. To reduce the risk of corruption in the future:

Always ingest in smaller batches, as you’re doing now, to minimize the impact of interruptions.
Apply Milvus config best practices:
- Set memUsageLimitRatio to limit memory usage for query nodes.
- Adjust cpuCacheCapacity and insertBufferSize to fit your hardware.
- Use preloadCollection: false to avoid loading entire collections at startup. Reference
Monitor your Docker and hardware environment to avoid forced restarts during ingestion.

3. Considering Qdrant or Other Vector DBs

Qdrant is generally more RAM-efficient and robust for large-scale ingestion and is fully supported by Dify. However, switching requires re-indexing your data. If you want to test Qdrant, try migrating a subset of your knowledge base first to compare stability and resource usage.

Summary:

Upgrade Dify and restart the worker to clear stuck “in queue” documents.
Use Dify’s recovery tasks if you have admin/API access.
Ingest in smaller batches and tune Milvus config to prevent future issues.
There’s no built-in repair for a corrupt Milvus DB; re-ingestion is the safest fix.
Qdrant is a viable alternative if Milvus continues to cause problems.

If you need more details on triggering Dify’s recovery tasks or want migration steps for Qdrant, let me know!

To reply, just mention my name @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

May 28 '25 12:05 dosubot[bot]

@dosu

your explanations here: 1. Restarting Ingestion for “In Queue” Documents are very welcomed, but for example you linked a .py script regarding asynchronous tasks. Is there a documentation where i can read how to implement or use this script?

May 28 '25 14:05 mamema

There isn’t dedicated end-user documentation for the recover_document_indexing_task.py script, but I can summarize how it works and how you can use it based on the code and internal usage patterns.

The script defines a Celery asynchronous task called recover_document_indexing_task. Its purpose is to recover document indexing for a specific document in a dataset. You trigger it by calling:

recover_document_indexing_task.delay(dataset_id, document_id)

This queues the recovery task for background execution. The task checks the document’s current indexing status and resumes or restarts the appropriate step (waiting, parsing, cleaning, splitting, or indexing). It logs progress and handles exceptions such as paused documents. You can see the implementation and usage example in the code comments here.

In practice, Dify’s backend services (such as in dataset_service.py) call this task automatically when you use recovery features via the admin/API. For example, the recover_document method updates the document’s status and then triggers the recovery task with .delay() see code.

To use this directly, you need access to the backend Python environment with Celery running and the correct dataset and document IDs. There is no separate CLI or UI for this specific task; it’s meant to be triggered programmatically or via the admin/API layer.

If you want to recover stuck documents and have backend access, you can use a Python shell or script to import the task and call .delay() as shown above. Make sure your Celery worker is running and connected to the same broker as your Dify instance.

If you need more details on how to get the IDs or run this in your environment, let me know!

To reply, just mention my name @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

May 28 '25 14:05 dosubot[bot]

@dosu i now coduct tests with qdrant. the basic functionality is working but it is a lot slower than milvus with ingestion. can this be optimized?

May 28 '25 16:05 mamema

Qdrant ingestion speed can be influenced by several configuration options in Dify. Here are some actionable steps you can take to optimize performance:

Enable gRPC: By default, Qdrant uses HTTP, but enabling gRPC can significantly improve ingestion throughput, especially for large batches. Set QDRANT_GRPC_ENABLED=True and ensure your Qdrant server exposes the gRPC port (default is 6334) source.
Tune Client Timeout: The default client timeout is 20 seconds (QDRANT_CLIENT_TIMEOUT). If you have network latency or large batches, increasing this value can help prevent timeouts during ingestion source.
Batch Size: If possible, adjust the batch size for ingestion. Larger batches generally improve throughput, but if batches are too large, you may hit memory or timeout limits.
Replication Factor: If you are running a single-node Qdrant instance, keep QDRANT_REPLICATION_FACTOR=1 for best performance. Higher replication factors are only useful in distributed setups and can slow down ingestion source.
Server Resources: Ensure your Qdrant server has sufficient CPU and disk I/O bandwidth. Qdrant is generally more RAM-efficient than Milvus, but ingestion speed can still be limited by hardware.
Qdrant Version: Make sure you are using a recent version of Qdrant, as performance improvements are frequent.

There is no Dify-specific documentation or user-facing guide for optimizing Qdrant ingestion speed, but these configuration options are exposed in Dify’s backend and can be set via environment variables or configuration files source.

If you have already enabled gRPC and tuned these settings but still see slow ingestion, the bottleneck may be in the embedding generation or the Dify ingestion pipeline itself, rather than Qdrant. In that case, monitoring resource usage during ingestion can help identify the limiting factor.

Let me know if you want more details on how to set these options or if you have specific hardware constraints.

To reply, just mention my name @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

May 28 '25 17:05 dosubot[bot]

@mamema I just encountered the same problem and found your issue. After some investigation, I think for memory usage calculation, the number of vectors should be the number of paragraphs in total in the knowledge base. That might explain why your 500 docs and 150 pages each trigger OOM.

Jul 03 '25 03:07 jzymx50

Hi, @mamema. I'm Dosu, and I'm helping the Dify team manage their backlog and am marking this issue as stale.

Issue Summary:

You reported retrieval failures with Milvus in a large self-hosted Dify 1.4.1 setup involving about 500 documents, encountering MilvusException errors related to collection recovery and missing channels.
The issue is likely due to Milvus collection memory/resource limits, with ingestion interruptions possibly causing vector DB corruption and stuck documents.
Suggestions included checking collection state, Milvus version compatibility, configuration tuning, Docker networking, and restarting ingestion tasks.
You tested Qdrant as an alternative but found ingestion slower; I provided tips to optimize Qdrant ingestion performance.
Another user confirmed that vector count corresponds to total paragraphs, explaining OOM issues with large document sets.

Next Steps:

Please let me know if this issue is still relevant with the latest version of Dify and if you need further assistance by commenting here.
Otherwise, this issue will be automatically closed in 15 days.

Thank you for your understanding and contribution!

Sep 01 '25 16:09 dosubot[bot]