dify icon indicating copy to clipboard operation
dify copied to clipboard

Trigger the knowledge base pipeline via API?

Open keminar opened this issue 2 weeks ago • 2 comments

Self Checks

  • [x] I have read the Contributing Guide and Language Policy.
  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report, otherwise it will be closed.
  • [x] Please do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing? Tell me about your story.

I identified some issues, which were then closed for unknown reasons:

https://github.com/langgenius/dify/issues/28278

Since I also have this requirement, I attempted to modify the code to enable the following three interfaces:

  • /datasets/{dataset_id}/pipeline/workflow
  • /datasets/pipeline/file-upload
  • /datasets/{dataset_id}/pipeline/run

api/controllers/service_api/dataset/rag_pipeline/rag_pipeline_workflow.py

api/controllers/service_api/init.py

I hope these fixes can be synced to the main branch as soon as possible, to avoid having to apply the same patch repeatedly when upgrading Dify.

2. Additional context or comments

No response

3. Can you help us with this feature?

  • [ ] I am interested in contributing to this feature.

keminar avatar Dec 26 '25 08:12 keminar

Step 1: Access /datasets/{dataset_id} to check if the runtime_mode field is set to the pipeline mode.

Step 2: Upload a file via /datasets/pipeline/file-upload to retrieve the file information.

Step 3: Access /datasets/{dataset_id}/pipeline/workflow?is_published=true to get the start_node_id.

Step 4: Trigger the pipeline via /datasets/{dataset_id}/pipeline/run

Parameter Example:

{
  "inputs": {},
  "start_node_id": "1766452619561",
  "datasource_type": "local_file",
  "datasource_info_list": [
    {
      "related_id": "c09e91df-14f9-46f6-98c8-9e40b2864b49",
      "name": "Introduction to the Scenery of Spring.docx",
      "size": 92171,
      "extension": "docx",
      "mime_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
      "url": "",
      "transfer_method": "local_file",
      "credential_id": ""
    }
  ],
  "is_published": true,
  "response_mode": "blocking"
}```

keminar avatar Dec 26 '25 08:12 keminar

Here's how the pipeline triggering process works based on the current Dify implementation:

  1. You can check the dataset's runtime_mode using GET /datasets/{dataset_id}—the response includes this field, so you can confirm if it's in pipeline mode (code reference, fields).

  2. Upload your file using POST /datasets/pipeline/file-upload with a dataset-scoped API key (knowledge base API key) as a Bearer token. The response gives you file metadata (id, name, size, etc.) (discussion, code).

  3. There is no /datasets/{dataset_id}/pipeline/workflow endpoint. Instead, use /datasets/{dataset_id}/pipeline/datasource-plugins?is_published=true to list datasource plugins for the pipeline. You may need to inspect the returned workflow graph or plugins to find the correct start_node_id (code).

  4. Trigger the pipeline with POST /datasets/{dataset_id}/pipeline/run using a JSON payload like your example. Required fields are: inputs, start_node_id, datasource_type, datasource_info_list, is_published, and response_mode (code, model).

Caveats and requirements:

  • All endpoints require authentication with a dataset API key. Using the wrong key type will result in a 400 error (discussion).
  • Only certain file types are supported for upload (e.g., pdf, docx, txt). Images are not accepted as knowledge base documents (discussion).
  • The /datasets/{dataset_id}/pipeline/run endpoint is considered experimental and lacks official documentation. It may not work reliably in all environments or versions (discussion, issue).
  • If you encounter issues (e.g., 404 errors or pipeline not running), you may need to patch your deployment or upgrade to the latest Dify version for improved reliability (issue, discussion).
  • The most reliable way to trigger custom pipelines is still via the Web UI. API automation may not support all custom nodes or workflows (discussion, issue).

Let me know if you need example requests or help interpreting the plugin/workflow graph to find the start_node_id.

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other

Chat with Dosu Open in Cursor Join Discord Share on X

dosubot[bot] avatar Dec 26 '25 08:12 dosubot[bot]

i have tried below request, but get 404 response curl --location --request GET 'http://myServerIp:62990/v1/datasets/6c9f15e6-df16-44ff-92f8-43f0841103c8/pipeline/datasource-plugins?is_published=true'
--header 'Authorization: Bearer xxxx'
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)'
--header 'Accept: /'
--header 'Host: myServerIp:62990'
--header 'Connection: keep-alive'

404 Not Found

Not Found

The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.

tphrbeu-max avatar Dec 28 '25 10:12 tphrbeu-max

and my dataset info is below curl --location --request GET 'http://myserverip:62990/v1/datasets/6c9f15e6-df16-44ff-92f8-43f0841103c8'
--header 'Authorization: Bearer xxxx'
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)'
--header 'Accept: /'
--header 'Host: myserverip:62990'
--header 'Connection: keep-alive' { "id": "6c9f15e6-df16-44ff-92f8-43f0841103c8", "name": "标准规范知识库", "description": "", "provider": "vendor", "permission": "all_team_members", "data_source_type": "upload_file", "indexing_technique": "high_quality", "app_count": 2, "document_count": 61, "word_count": 2238746, "created_by": "c1795e27-8626-4735-84c9-1794a1a70e67", "author_name": "user", "created_at": 1762759794, "updated_by": "c1795e27-8626-4735-84c9-1794a1a70e67", "updated_at": 1766382697, "embedding_model": "Qwen3-Embedding-0.6B", "embedding_model_provider": "langgenius/xinference/xinference", "embedding_available": true, "retrieval_model_dict": { "search_method": "hybrid_search", "reranking_enable": true, "reranking_mode": "reranking_model", "reranking_model": { "reranking_provider_name": "langgenius/xinference/xinference", "reranking_model_name": "Qwen3-Reranker-0.6B" }, "weights": { "weight_type": null, "keyword_setting": { "keyword_weight": 0.3 }, "vector_setting": { "vector_weight": 0.7, "embedding_model_name": "Qwen3-Embedding-0.6B", "embedding_provider_name": "langgenius/xinference/xinference" } }, "top_k": 6, "score_threshold_enabled": false, "score_threshold": 0.0 }, "tags": [ { "id": "7cb47cac-6388-4f6c-9fb7-4ebc041dbf57", "name": "规范设计助手", "type": "knowledge" } ], "doc_form": "text_model", "external_knowledge_info": { "external_knowledge_id": null, "external_knowledge_api_id": null, "external_knowledge_api_name": null, "external_knowledge_api_endpoint": null }, "external_retrieval_model": { "top_k": 6, "score_threshold": 0.0, "score_threshold_enabled": false }, "doc_metadata": [ { "id": "289c68fb-9466-4889-8179-7a3c841509c1", "name": "standard_number", "type": "string" }, { "id": "ace090b4-1d70-42ce-94b7-bf77cdd60e96", "name": "standard_name", "type": "string" } ], "built_in_field_enabled": false, "pipeline_id": "019a6cab-f7fc-7a65-b5ae-4a6910f99cd8", "runtime_mode": "rag_pipeline", "chunk_structure": "text_model", "icon_info": { "icon_type": null, "icon": null, "icon_background": null, "icon_url": null }, "is_published": true, "total_documents": 61, "total_available_documents": 61, "enable_api": true, "is_multimodal": false }

tphrbeu-max avatar Dec 28 '25 11:12 tphrbeu-max

@keminar I followed your instructions, but I encountered an error when I got to step 2: "Step 2: Upload a file via /datasets/pipeline/file-upload to retrieve the file information." I used the edited code you provided, but my Docker container keeps restarting. Could this be due to additional steps or version incompatibility?

xiaolizi000000-afk avatar Dec 30 '25 07:12 xiaolizi000000-afk

@keminar @tphrbeu-max @xiaolizi000000-afk @appleboy @tomoyuki28jp @claude

I attempted to use the API to upload files to trigger the knowledge pipeline.

Goal: Trigger the configured knowledge pipeline through the Dify knowledge base API.

APIs Tested:

  1. create_by_file API
  • Attempted to add created_from parameter → Parameter was ignored
  • Used hierarchical_model and hierarchical processing rules → Still processed with simple segmentation
  • Result: Documents uploaded via API show created_from = "api" instead of "rag-pipeline" , meaning they bypass the complete knowledge pipeline processing (no LLM analysis, only simple line-based segmentation)

Key Difference:

  • Manual upload via Web UI: created_from = "rag-pipeline" → Full pipeline processing with LLM analysis and hierarchical segmentation
  • API upload: created_from = "api" → Simple segmentation only, no LLM analysis

6mvp6 avatar Dec 31 '25 03:12 6mvp6