dify Fork Join Parallelism for Workflows

Self Checks

[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] Pleas do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing?

Some workflow configurations have high response latency because they necessitate multiple orthogonal requests to execute sequentially. The latency can be greatly reduced via simple fork-join parallelism. For example, multiple LLM generations in parallel with HTTP requests.

2. Describe the feature you'd like to see

Add Fork and Join blocks.

Place a Fork anywhere an LLM/HTTP block could be placed.
The output is a series of pathways which can be executed in parallel.
In a pathway, only variables defined before the Fork can be accessed.
Pathways potentially have a constrained selection of blocks. For example, for chat workflows, "Answer" blocks might need to be disabled since execution order isn't deterministic.
In order for the configuration to be valid, all parallel pathways which originate from a Fork must end up in a Join block.
Join block waits for all pathways to finish, then continue sequential execution.

3. How will this feature improve your workflow or experience?

Improves response latency by performing orthogonal tasks in parallel.

4. Additional context or comments

Fork Join diagram courtesy of this paper.

5. Can you help us with this feature?

[X] I am interested in contributing to this feature.

May 10 '24 17:05 QuietRocket

Using workflows as tools I think will contribute to the benefit of this feature. Many times i'm taking a query and executing multiple individual llm queries to a cheaper llm, rather than packing it into a single llm completion. Or, i might want to execute multiple searches via searxng tool in parallel, then aggregate the response.

At the moment, there is no way to do this without doing it external to dify where the token counts are no longer available to understand the total workflow cost.

Jun 11 '24 15:06 rothnic

parallel nodes feature would be very useful. it's a common use case. here a fake demo for this kind of workflow in my mind that can make a workflow more efficient and flexible if the "parallel node" can be connected to any kind of node. Screenshot 2024-06-25 at 17 07 38

Jun 25 '24 09:06 rainchen

Probably needs a mechanism to do rate and concurrency limiting, but a fork-join or map-reduce pattern would make a lot of use-cases for the new iteration block run much faster.