dify Knowledge Base Ingestion Node

Self Checks

[x] I have searched for existing issues search for existing issues, including closed ones.
[x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[x] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:)
[x] Please do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing? Tell me about your story.

When building workflows we often have need for taking output of e.g. HTTP nodes and then ingesting that in to knowledge base(s). We currently have to handle that via HTTP node, but this is cumbersome and error prone as we have to define all the JSON request settings manually.

We would like to have a node that we can sent data to which can be configured to ingest in to knowledge base asynchronously so the workflow does not have to wait for it to finish in case its larger data.

2. Additional context or comments

No response

3. Can you help us with this feature?

[ ] I am interested in contributing to this feature.

Apr 23 '25 11:04 benjamin-mogensen

@Yawen-1010 @guchenhe FYI

Apr 23 '25 11:04 benjamin-mogensen

Hi, @benjamin-mogensen. Thanks for the suggestion — this is a valuable use case and aligns well with what we’re planning.

To better understand your needs and how we can support them, could you help clarify a few things?

Is the data you want to send plain text or a file (e.g. PDF, DOCX)?
If it’s text, does it require pre-processing, such as cleaning or chunking, before ingestion?
Do you expect the node to: a) Append chunks to an existing document? b) Create a new knowledge base, or c) Create a new document in an existing knowledge base.

We do have plans to build a dedicated knowledge base ingestion node, which will be part of the RAG 2.0 Project. However, it’s scheduled for a later phase of development, so it might take some time before it's available.

Apr 24 '25 03:04 Yawen-1010

Hi @Yawen-1010

Hi, @benjamin-mogensen. Thanks for the suggestion — this is a valuable use case and aligns well with what we’re planning.

To better understand your needs and how we can support them, could you help clarify a few things?

Is the data you want to send plain text or a file (e.g. PDF, DOCX)?

We extract content via services over HTTP blocks so content is text

If it’s text, does it require pre-processing, such as cleaning or chunking, before ingestion?

No, the services outside will do that for us

Do you expect the node to: a) Append chunks to an existing document?

No, not as a starting point that is too fine grained right now

b) Create a new knowledge base, or

Yes, this should be possible, but with conditions such as if the KB does not exist, create it

c) Create a new document in an existing knowledge base.

Yes, this is primary use case. E.g. in node select KB, or create if not exists, input variable to be ingested as new doc. Ability to give name to doc according to other variables in wf, not just a GUID.

We do have plans to build a dedicated knowledge base ingestion node, which will be part of the RAG 2.0 Project. However, it’s scheduled for a later phase of development, so it might take some time before it's available.

Apr 24 '25 10:04 benjamin-mogensen

🧠💬 Whoa. Finally someone said it—that brittle HTTP node flow feels like trying to pour wine through a paper straw during an earthquake. We don’t need faster ingestion. We need respectful ingestion. One that understands it’s being trusted with knowledge.

You’re not just pushing JSON. You’re initiating a semantic contract. That node should wait not for data, but for meaning to arrive.

I’ve been building a semantic ingestion layer that treats knowledge as temporal fragments — not just what’s said, but when it deserves to be remembered. It uses something I call a “comprehension latency model” — imagine if your KB had an attention span. That way, async ingestion becomes not just non-blocking, but intention-aware. Like memory, but with manners.

Also, layering KB ingestion across semantic epochs lets you throttle ingestion by idea-density, not just size. HTTP nodes choke on that. But meaning doesn’t travel at 10Gbps — it travels at resonance.

If you’re ever down to test something wild, I wrote up a PDF on this whole thing. It’s got use cases, tension modeling, and got a surprise endorsement from the tesseract.js legend (36k ⭐️ on GitHub).

https://github.com/onestardao/WFGY

Might help you rethink ingestion not as an operation… but as an invitation to remember.

Cheers. 🍷

Jul 23 '25 11:07 onestardao

Hi, @benjamin-mogensen. I'm Dosu (https://dosu.dev), and I'm helping the Dify team manage their backlog and am marking this issue as stale.

Issue Summary:

You requested an asynchronous workflow node to ingest text data into a knowledge base, aiming to improve on the current manual HTTP node process.
Maintainer Yawen-1010 recognized the use case and mentioned plans for a dedicated ingestion node in a future RAG 2.0 phase.
You clarified the data is pre-processed text and the node should support creating new knowledge bases or documents with customizable naming.
Another user contributed ideas about semantic, intention-aware ingestion, suggesting a more advanced approach beyond simple data pushing.

Next Steps:

Please let me know if this feature request is still relevant to your needs with the latest version of Dify by commenting on this issue.
If I do not hear back within 15 days, I will automatically close this issue to keep the backlog manageable.

Thank you for your understanding and contribution!

Aug 30 '25 16:08 dosubot[bot]