Trigger the knowledge base pipeline via API?
Self Checks
- [x] I have read the Contributing Guide and Language Policy.
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report, otherwise it will be closed.
- [x] Please do not modify this template :) and fill in all the required fields.
1. Is this request related to a challenge you're experiencing? Tell me about your story.
I identified some issues, which were then closed for unknown reasons:
https://github.com/langgenius/dify/issues/28278
Since I also have this requirement, I attempted to modify the code to enable the following three interfaces:
- /datasets/{dataset_id}/pipeline/workflow
- /datasets/pipeline/file-upload
- /datasets/{dataset_id}/pipeline/run
api/controllers/service_api/dataset/rag_pipeline/rag_pipeline_workflow.py
api/controllers/service_api/init.py
I hope these fixes can be synced to the main branch as soon as possible, to avoid having to apply the same patch repeatedly when upgrading Dify.
2. Additional context or comments
No response
3. Can you help us with this feature?
- [ ] I am interested in contributing to this feature.
Step 1: Access /datasets/{dataset_id} to check if the runtime_mode field is set to the pipeline mode.
Step 2: Upload a file via /datasets/pipeline/file-upload to retrieve the file information.
Step 3: Access /datasets/{dataset_id}/pipeline/workflow?is_published=true to get the start_node_id.
Step 4: Trigger the pipeline via /datasets/{dataset_id}/pipeline/run
Parameter Example:
{
"inputs": {},
"start_node_id": "1766452619561",
"datasource_type": "local_file",
"datasource_info_list": [
{
"related_id": "c09e91df-14f9-46f6-98c8-9e40b2864b49",
"name": "Introduction to the Scenery of Spring.docx",
"size": 92171,
"extension": "docx",
"mime_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"url": "",
"transfer_method": "local_file",
"credential_id": ""
}
],
"is_published": true,
"response_mode": "blocking"
}```
Here's how the pipeline triggering process works based on the current Dify implementation:
-
You can check the dataset's
runtime_modeusingGET /datasets/{dataset_id}—the response includes this field, so you can confirm if it's in pipeline mode (code reference, fields). -
Upload your file using
POST /datasets/pipeline/file-uploadwith a dataset-scoped API key (knowledge base API key) as a Bearer token. The response gives you file metadata (id, name, size, etc.) (discussion, code). -
There is no
/datasets/{dataset_id}/pipeline/workflowendpoint. Instead, use/datasets/{dataset_id}/pipeline/datasource-plugins?is_published=trueto list datasource plugins for the pipeline. You may need to inspect the returned workflow graph or plugins to find the correctstart_node_id(code). -
Trigger the pipeline with
POST /datasets/{dataset_id}/pipeline/runusing a JSON payload like your example. Required fields are:inputs,start_node_id,datasource_type,datasource_info_list,is_published, andresponse_mode(code, model).
Caveats and requirements:
- All endpoints require authentication with a dataset API key. Using the wrong key type will result in a 400 error (discussion).
- Only certain file types are supported for upload (e.g., pdf, docx, txt). Images are not accepted as knowledge base documents (discussion).
- The
/datasets/{dataset_id}/pipeline/runendpoint is considered experimental and lacks official documentation. It may not work reliably in all environments or versions (discussion, issue). - If you encounter issues (e.g., 404 errors or pipeline not running), you may need to patch your deployment or upgrade to the latest Dify version for improved reliability (issue, discussion).
- The most reliable way to trigger custom pipelines is still via the Web UI. API automation may not support all custom nodes or workflows (discussion, issue).
Let me know if you need example requests or help interpreting the plugin/workflow graph to find the start_node_id.
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
i have tried below request, but get 404 response
curl --location --request GET 'http://myServerIp:62990/v1/datasets/6c9f15e6-df16-44ff-92f8-43f0841103c8/pipeline/datasource-plugins?is_published=true'
--header 'Authorization: Bearer xxxx'
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)'
--header 'Accept: /'
--header 'Host: myServerIp:62990'
--header 'Connection: keep-alive'
Not Found
The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.
and my dataset info is below
curl --location --request GET 'http://myserverip:62990/v1/datasets/6c9f15e6-df16-44ff-92f8-43f0841103c8'
--header 'Authorization: Bearer xxxx'
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)'
--header 'Accept: /'
--header 'Host: myserverip:62990'
--header 'Connection: keep-alive'
{
"id": "6c9f15e6-df16-44ff-92f8-43f0841103c8",
"name": "标准规范知识库",
"description": "",
"provider": "vendor",
"permission": "all_team_members",
"data_source_type": "upload_file",
"indexing_technique": "high_quality",
"app_count": 2,
"document_count": 61,
"word_count": 2238746,
"created_by": "c1795e27-8626-4735-84c9-1794a1a70e67",
"author_name": "user",
"created_at": 1762759794,
"updated_by": "c1795e27-8626-4735-84c9-1794a1a70e67",
"updated_at": 1766382697,
"embedding_model": "Qwen3-Embedding-0.6B",
"embedding_model_provider": "langgenius/xinference/xinference",
"embedding_available": true,
"retrieval_model_dict": {
"search_method": "hybrid_search",
"reranking_enable": true,
"reranking_mode": "reranking_model",
"reranking_model": {
"reranking_provider_name": "langgenius/xinference/xinference",
"reranking_model_name": "Qwen3-Reranker-0.6B"
},
"weights": {
"weight_type": null,
"keyword_setting": {
"keyword_weight": 0.3
},
"vector_setting": {
"vector_weight": 0.7,
"embedding_model_name": "Qwen3-Embedding-0.6B",
"embedding_provider_name": "langgenius/xinference/xinference"
}
},
"top_k": 6,
"score_threshold_enabled": false,
"score_threshold": 0.0
},
"tags": [
{
"id": "7cb47cac-6388-4f6c-9fb7-4ebc041dbf57",
"name": "规范设计助手",
"type": "knowledge"
}
],
"doc_form": "text_model",
"external_knowledge_info": {
"external_knowledge_id": null,
"external_knowledge_api_id": null,
"external_knowledge_api_name": null,
"external_knowledge_api_endpoint": null
},
"external_retrieval_model": {
"top_k": 6,
"score_threshold": 0.0,
"score_threshold_enabled": false
},
"doc_metadata": [
{
"id": "289c68fb-9466-4889-8179-7a3c841509c1",
"name": "standard_number",
"type": "string"
},
{
"id": "ace090b4-1d70-42ce-94b7-bf77cdd60e96",
"name": "standard_name",
"type": "string"
}
],
"built_in_field_enabled": false,
"pipeline_id": "019a6cab-f7fc-7a65-b5ae-4a6910f99cd8",
"runtime_mode": "rag_pipeline",
"chunk_structure": "text_model",
"icon_info": {
"icon_type": null,
"icon": null,
"icon_background": null,
"icon_url": null
},
"is_published": true,
"total_documents": 61,
"total_available_documents": 61,
"enable_api": true,
"is_multimodal": false
}
@keminar I followed your instructions, but I encountered an error when I got to step 2: "Step 2: Upload a file via /datasets/pipeline/file-upload to retrieve the file information." I used the edited code you provided, but my Docker container keeps restarting. Could this be due to additional steps or version incompatibility?
@keminar @tphrbeu-max @xiaolizi000000-afk @appleboy @tomoyuki28jp @claude
I attempted to use the API to upload files to trigger the knowledge pipeline.
Goal: Trigger the configured knowledge pipeline through the Dify knowledge base API.
APIs Tested:
- create_by_file API
- Attempted to add created_from parameter → Parameter was ignored
- Used hierarchical_model and hierarchical processing rules → Still processed with simple segmentation
- Result: Documents uploaded via API show created_from = "api" instead of "rag-pipeline" , meaning they bypass the complete knowledge pipeline processing (no LLM analysis, only simple line-based segmentation)
Key Difference:
- Manual upload via Web UI: created_from = "rag-pipeline" → Full pipeline processing with LLM analysis and hierarchical segmentation
- API upload: created_from = "api" → Simple segmentation only, no LLM analysis