[Question]: how to merge ragflow ?
Self Checks
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (Language Policy).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
- [x] Please do not modify this template :) and fill in all the required fields.
Describe your problem
how to merge ragflow?
@dosu
我查看了文档,文档中只有数据迁移,我在使用ragflow的时候为了加快解析数据,将服务部署在了几十台服务器上,现在解析完成了,我该如何将这些数据合并到一台机器上? @bosu
Hi @giserd! I'm Dosu and I’m helping the ragflow team.
RagFlow supports several types of "merge" operations, but there isn't a single built-in feature to merge entire RagFlow instances or codebases.
-
If you want to merge knowledge bases from multiple RagFlow deployments, the recommended approach is to set up a distributed architecture: run multiple RagFlow nodes with identical configs pointing to the same Elasticsearch, MySQL, Redis, and Minio services. This way, all nodes contribute to a unified knowledge base without manual merging. Details here.
-
If you mean merging document blocks after parsing (for example, reducing fragmentation from MinerU or DeepDoc), use the HierarchicalMerger component. It lets you define custom regex rules and hierarchy depth to merge small chunks into larger, more meaningful ones. This works for all parsers and is configurable in the workflow UI. See code and usage.
-
If you're asking about merging workflow branches, RagFlow's merge nodes currently trigger as soon as any upstream branch finishes, not after all branches complete. Achieving "wait-for-all" behavior (like Dify) would require custom workflow engine changes. More info.
Could you clarify which type of merging you need help with? (Knowledge base, document blocks, workflows, or something else?) That way I can give you a more specific guide or code reference.
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
我部署的时候没有将docker指向相同的Elasticsearch、MySQL、Redis 和 Minio 服务,有什么办法可以合并这些分散在很多服务器上的数据吗? @bosu
I do not think it's an feasible job to merge data in so many engines.
或者我是否可以使用 list_chunk和add_chunk的方式通过创建dataset 然后把所有的ragflow里的chunk都加入到一个ragflow? @bosu
To clarify, the list and add operations apply to datasets, while RagFlow refers to the entire service. These are two different concepts, so it seems there may be some confusion between them.