flowable-engine icon indicating copy to clipboard operation
flowable-engine copied to clipboard

Different behaviour between parallel and inclusive gateways

Open wberges opened this issue 5 years ago • 18 comments

Hello,

It seems that Parallel and Inclusive Gateways don't have the same behaviour (in the same configuration).

Tests done on processes with 3 parallel branches (1 branch with a catching message event): I have used 2 test sets: 1 using Parallel gateways, 1 using Inclusive gateways. And the Inclusive gateway use always the 3 branches (should so be equivalent to the parallel gateway, without default branch for this test).

Test set 1: Parallel gateways image

Test set 2: Inclusive gateways image

For each set, different settings on the Join gateway (4):

  • Sync (no Async & Exclusive flags)
  • Sync & Exclusive
  • Async
  • Async & Exclusive

What is strange to me is that we do not get the same result with parallel and inclusive gateways. With Parallel gateway, we have 2 scenarios without OptimisticLock exception. But with Inclusive gateway, all scenarios generate an OptimisticLock exception!.. And it’s really problematic (to me at least :)).

For now, I see 2 "workarounds" to avoid OptimisticLockExceptions:

  • use a Triggerable task to embed both Send and Receive tasks. But in this case, no way to use a Camel task. And not sure it will solve the problem.
  • convert my parallel process in a serial process (3 branches in 1), but really not what I want..

Here's the post in the forum: https://forum.flowable.org/t/different-behaviour-between-parallel-and-inclusive-gateways-tests-using-async-and-or-exclusive-flags/3848

I join the project I used for my tests (includes 8 BPMN & Java Test file): UnitTesting-ParallelGateway.zip

Best Regards William

wberges avatar May 13 '19 07:05 wberges

Hi William,

Thanks a lot for the detailed description. We will look into this asap and come back with our findings.

Best regards,

Tijs

tijsrademakers avatar May 13 '19 08:05 tijsrademakers

Hello,

No news about this strange behaviour?

Here's a log (generated by running my 5 branches test file): TEST_JUNIT_INCLUSIVE_ASYNC_EXCLUSIVE 5 branches.log

Thanks Best Regards William

wberges avatar May 28 '19 14:05 wberges

Hi William,

No not yet sorry, it's still high on the todo list, so will be picked up soon.

Thanks,

Tijs

tijsrademakers avatar May 28 '19 14:05 tijsrademakers

Hi William,

We looked into this and the the problem is that the inclusive and parallel gateway behaviours lock the process execution entity as well, separate from the locking that happens in the async and exclusive job handling. Because some async jobs are set to exclusive = false, parallel execution is happening and there's a big chance of collision. The same logic is happening for the parallel gateway, but the logic that is implemented in the parallel gateway is a lot less complex and therefore the time that a collision can happen is a lot less.

We've been discussing, that a separate job lock table might fix this issue, but that will need some more thinking and experimenting.

In the end, there is an optimistic lock exception but the process in the end still finishes correctly. The job is just executed more than once. Is this causing issues on your end?

Best regards,

Tijs

tijsrademakers avatar Jun 03 '19 16:06 tijsrademakers

Hello Tijs, First of all, thanks to your team for the bug analysis 😁 About the issue, it is a problem when it is the external event thread which is rejected due to an optimistic lock. In this case, if the external system doesn't send again the event, it is lost, and it's unfortunately my case. I can try to add an asynchronous task between the event and the gateway (equivalent to an Asynchronous After, which doesn't exist in Flowable, on the Receive Event) to force a potential retry by the engine. But it is a workaround (not tested, so not sure it will work), not a solution. Do you think that using a triggerable task in the event branch (in place of both send and receive tasks) would solve the problem? Best regards and thanks again for your wonderful work on Flowable. William

wberges avatar Jun 03 '19 20:06 wberges

Hello, Still no solution for this bug concerning the parallel execution with optimistic lock exception? What is strange to me is that it seems I'm the only one to raise this problem when (real) parallel executions with events received from external systems should be used often... Currently, the parallel gateway is useless if I have to use the same thread for all branches. Thanks for your feedback. Best regards

wberges avatar Oct 13 '19 20:10 wberges

Hi all, As proposed by @wberges, I have checked that this problem can be worked around replacing all steps of each branch by a dedicated call activity. The process definition is now as: with-callactivity where the process definitions are:

  • 'branche #1': BRANCHE_1_ASYNC
  • 'branche #2': BRANCHE_2_ASYNC
  • and 'branche #3': BRANCHE_3_ASYNC

To identify easily the sub-process started by the call activity and waiting the event, a unique business key can be used. Caution to propagate it correctly on call activity.

I join the initial test project updated with tests about this workaround: UnitTesting-ParallelGateway.zip

This workaround works fine with Flowable 6.3.1 but a new concurrency error occurs with Flowable 6.4.2 for tests TEST_JUNIT_INCLUSIVE_ASYNC_CALLACTIVITY and TEST_JUNIT_INCLUSIVE_ASYNC_EXCLUSIVE_CALLACTIVITY:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@ TEST_JUNIT_INCLUSIVE_ASYNC_CALLACTIVITY @@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Tue Dec 10 09:22:51 CET 2019 ASYNC Step11 - before catch event
Tue Dec 10 09:22:51 CET 2019 ASYNC Step2 - execution before sleeping 3s
Tue Dec 10 09:22:51 CET 2019 ASYNC Step3 - execution before sleeping 10s
Tue Dec 10 09:22:52 CET 2019 EVENT Step12 - ##### MESSAGE RECEIVED #####
Tue Dec 10 09:22:54 CET 2019 ASYNC Step2 - execution after sleep
Tue Dec 10 09:23:01 CET 2019 ASYNC Step3 - execution after sleep
09:23:01,954 [flowable-async-job-executor-thread-2] ERROR org.flowable.common.engine.impl.interceptor.CommandContext  - Error while closing command context
java.util.ConcurrentModificationException
	at java.util.HashMap$HashIterator.nextNode(HashMap.java:1445)
	at java.util.HashMap$ValueIterator.next(HashMap.java:1474)
	at org.flowable.engine.impl.agenda.ExecuteInactiveBehaviorsOperation.run(ExecuteInactiveBehaviorsOperation.java:58)
	at org.flowable.engine.impl.interceptor.CommandInvoker.executeOperation(CommandInvoker.java:88)
	at org.flowable.engine.impl.interceptor.CommandInvoker.executeOperations(CommandInvoker.java:72)
	at org.flowable.engine.impl.interceptor.CommandInvoker.execute(CommandInvoker.java:62)
	at org.flowable.engine.impl.interceptor.BpmnOverrideContextInterceptor.execute(BpmnOverrideContextInterceptor.java:25)
	at org.flowable.common.engine.impl.interceptor.TransactionContextInterceptor.execute(TransactionContextInterceptor.java:53)
	at org.flowable.common.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:72)
	at org.flowable.common.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:30)
	at org.flowable.common.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:56)
	at org.flowable.common.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:51)
	at org.flowable.job.service.impl.asyncexecutor.ExecuteAsyncRunnable.executeJob(ExecuteAsyncRunnable.java:128)
	at org.flowable.job.service.impl.asyncexecutor.ExecuteAsyncRunnable.run(ExecuteAsyncRunnable.java:116)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
09:23:01,958 [flowable-async-job-executor-thread-2] ERROR org.flowable.job.service.impl.asyncexecutor.DefaultAsyncRunnableExecutionExceptionHandler  - Job 304 failed

A HashMap seems to be used in a concurrent context. @tijsrademakers , should this map not be created as ConcurrentHashMap in org.flowable.engine.impl.util.CommandContextUtil#addInvolvedExecution(...) ?

Regards

cdeneux avatar Dec 10 '19 08:12 cdeneux

Hi Christoph,

Thanks for the detailed analysis, we'll check the HashMap and see if it should be changed to a ConcurrentHashMap.

Thanks

tijsrademakers avatar Dec 10 '19 09:12 tijsrademakers

Hello @tijsrademakers, do you know when you will be able to check this ?

Thanks

realitix avatar Dec 13 '19 11:12 realitix

Hi,

Can't promise anything yet, but hopefully within the next couple of days

tijsrademakers avatar Dec 13 '19 15:12 tijsrademakers

Hi Christoph,

Thanks for providing the test project, that made it really easy to reproduce the issue. We have applied a fix for the concurrent modification issue:

https://github.com/flowable/flowable-engine/commit/eb650424cec2a018b9538856986aa857ca0af01c

Let us know if you encounter any issues.

tijsrademakers avatar Dec 17 '19 21:12 tijsrademakers

Hi @tijsrademakers, Thanks for the fix. I applied it on version 6.4.2. It solved the concurrency issue.

cdeneux avatar Dec 18 '19 14:12 cdeneux

Hello, Glad to see that this workaround can now be used without this new hashmap bug. :) But the question remains the same: there's still no true solution allowing to execute (real) parallel branches without optimistic lock exception (and a systematic replay of the whole branch, with all problems it involves...)? And strange that I'm again alone raising this problem when (real, not simulated using the same thread) parallel executions should be part of the basic Flowable features... Thanks for your work in all cases :) Regards

wberges avatar Apr 27 '20 10:04 wberges

Hi, I have a question about such behavior: if we set the "join" gateway (final <+> of the parallel gateway) as "Async" and "Exclusive", is it only the "join" job which will be replayed in case of Optimistic Lock exception? If it is the case, it's less problematic because we don't replay a task (including action/event), but just the final gateway (storage). Thanks for your help. Regards

wberges avatar May 12 '20 12:05 wberges

BTW, there's a bug at least in the 6.4.2 modeler: when I set both Async and Exclusive flags to the join parallel gateway, then export/import the BPMN, both flags are lost...

wberges avatar May 12 '20 15:05 wberges

Hello, I come back on this problem because we still have it. The new example is the following: image I have an inclusive gateway with several branches activated. Each one doesn't finish in a join inclusive gateway (that I could set as ASYNC and EXCLUSIVE to avoid the optimistic locking), but on an End event. And in this case, I retrieve my problem of locking. And there's no way to define a synchronization between branches except by putting an artificial inclusive join GTW to be able to set it as ASYNC and EXCLUSIVE. But for the design of the workflow, it's a pity... Do you have some tips to avoid this workaround? Thanks a lot again for your job :) Bets Regards

wberges avatar Oct 28 '22 08:10 wberges

Hi there. I’m looking at Flowable reliability and support to evaluate its implementation in a critical system.

I’m worried about finding bugs like this one that have no answers and is still open after a long time. Is there a rationale behind this?

PinoEire avatar May 03 '23 07:05 PinoEire