flowable-engine
flowable-engine copied to clipboard
Different behaviour between parallel and inclusive gateways
Hello,
It seems that Parallel and Inclusive Gateways don't have the same behaviour (in the same configuration).
Tests done on processes with 3 parallel branches (1 branch with a catching message event): I have used 2 test sets: 1 using Parallel gateways, 1 using Inclusive gateways. And the Inclusive gateway use always the 3 branches (should so be equivalent to the parallel gateway, without default branch for this test).
Test set 1: Parallel gateways
Test set 2: Inclusive gateways
For each set, different settings on the Join gateway (4):
- Sync (no Async & Exclusive flags)
- Sync & Exclusive
- Async
- Async & Exclusive
What is strange to me is that we do not get the same result with parallel and inclusive gateways. With Parallel gateway, we have 2 scenarios without OptimisticLock exception. But with Inclusive gateway, all scenarios generate an OptimisticLock exception!.. And it’s really problematic (to me at least :)).
For now, I see 2 "workarounds" to avoid OptimisticLockExceptions:
- use a Triggerable task to embed both Send and Receive tasks. But in this case, no way to use a Camel task. And not sure it will solve the problem.
- convert my parallel process in a serial process (3 branches in 1), but really not what I want..
Here's the post in the forum: https://forum.flowable.org/t/different-behaviour-between-parallel-and-inclusive-gateways-tests-using-async-and-or-exclusive-flags/3848
I join the project I used for my tests (includes 8 BPMN & Java Test file): UnitTesting-ParallelGateway.zip
Best Regards William
Hi William,
Thanks a lot for the detailed description. We will look into this asap and come back with our findings.
Best regards,
Tijs
Hello,
No news about this strange behaviour?
Here's a log (generated by running my 5 branches test file): TEST_JUNIT_INCLUSIVE_ASYNC_EXCLUSIVE 5 branches.log
Thanks Best Regards William
Hi William,
No not yet sorry, it's still high on the todo list, so will be picked up soon.
Thanks,
Tijs
Hi William,
We looked into this and the the problem is that the inclusive and parallel gateway behaviours lock the process execution entity as well, separate from the locking that happens in the async and exclusive job handling. Because some async jobs are set to exclusive = false, parallel execution is happening and there's a big chance of collision. The same logic is happening for the parallel gateway, but the logic that is implemented in the parallel gateway is a lot less complex and therefore the time that a collision can happen is a lot less.
We've been discussing, that a separate job lock table might fix this issue, but that will need some more thinking and experimenting.
In the end, there is an optimistic lock exception but the process in the end still finishes correctly. The job is just executed more than once. Is this causing issues on your end?
Best regards,
Tijs
Hello Tijs, First of all, thanks to your team for the bug analysis 😁 About the issue, it is a problem when it is the external event thread which is rejected due to an optimistic lock. In this case, if the external system doesn't send again the event, it is lost, and it's unfortunately my case. I can try to add an asynchronous task between the event and the gateway (equivalent to an Asynchronous After, which doesn't exist in Flowable, on the Receive Event) to force a potential retry by the engine. But it is a workaround (not tested, so not sure it will work), not a solution. Do you think that using a triggerable task in the event branch (in place of both send and receive tasks) would solve the problem? Best regards and thanks again for your wonderful work on Flowable. William
Hello, Still no solution for this bug concerning the parallel execution with optimistic lock exception? What is strange to me is that it seems I'm the only one to raise this problem when (real) parallel executions with events received from external systems should be used often... Currently, the parallel gateway is useless if I have to use the same thread for all branches. Thanks for your feedback. Best regards
Hi all,
As proposed by @wberges, I have checked that this problem can be worked around replacing all steps of each branch by a dedicated call activity. The process definition is now as:
where the process definitions are:
- 'branche #1':
- 'branche #2':
- and 'branche #3':
To identify easily the sub-process started by the call activity and waiting the event, a unique business key can be used. Caution to propagate it correctly on call activity.
I join the initial test project updated with tests about this workaround: UnitTesting-ParallelGateway.zip
This workaround works fine with Flowable 6.3.1 but a new concurrency error occurs with Flowable 6.4.2 for tests TEST_JUNIT_INCLUSIVE_ASYNC_CALLACTIVITY
and TEST_JUNIT_INCLUSIVE_ASYNC_EXCLUSIVE_CALLACTIVITY
:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@ TEST_JUNIT_INCLUSIVE_ASYNC_CALLACTIVITY @@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Tue Dec 10 09:22:51 CET 2019 ASYNC Step11 - before catch event
Tue Dec 10 09:22:51 CET 2019 ASYNC Step2 - execution before sleeping 3s
Tue Dec 10 09:22:51 CET 2019 ASYNC Step3 - execution before sleeping 10s
Tue Dec 10 09:22:52 CET 2019 EVENT Step12 - ##### MESSAGE RECEIVED #####
Tue Dec 10 09:22:54 CET 2019 ASYNC Step2 - execution after sleep
Tue Dec 10 09:23:01 CET 2019 ASYNC Step3 - execution after sleep
09:23:01,954 [flowable-async-job-executor-thread-2] ERROR org.flowable.common.engine.impl.interceptor.CommandContext - Error while closing command context
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1445)
at java.util.HashMap$ValueIterator.next(HashMap.java:1474)
at org.flowable.engine.impl.agenda.ExecuteInactiveBehaviorsOperation.run(ExecuteInactiveBehaviorsOperation.java:58)
at org.flowable.engine.impl.interceptor.CommandInvoker.executeOperation(CommandInvoker.java:88)
at org.flowable.engine.impl.interceptor.CommandInvoker.executeOperations(CommandInvoker.java:72)
at org.flowable.engine.impl.interceptor.CommandInvoker.execute(CommandInvoker.java:62)
at org.flowable.engine.impl.interceptor.BpmnOverrideContextInterceptor.execute(BpmnOverrideContextInterceptor.java:25)
at org.flowable.common.engine.impl.interceptor.TransactionContextInterceptor.execute(TransactionContextInterceptor.java:53)
at org.flowable.common.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:72)
at org.flowable.common.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:30)
at org.flowable.common.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:56)
at org.flowable.common.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:51)
at org.flowable.job.service.impl.asyncexecutor.ExecuteAsyncRunnable.executeJob(ExecuteAsyncRunnable.java:128)
at org.flowable.job.service.impl.asyncexecutor.ExecuteAsyncRunnable.run(ExecuteAsyncRunnable.java:116)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
09:23:01,958 [flowable-async-job-executor-thread-2] ERROR org.flowable.job.service.impl.asyncexecutor.DefaultAsyncRunnableExecutionExceptionHandler - Job 304 failed
A HashMap
seems to be used in a concurrent context. @tijsrademakers , should this map not be created as ConcurrentHashMap
in org.flowable.engine.impl.util.CommandContextUtil#addInvolvedExecution(...)
?
Regards
Hi Christoph,
Thanks for the detailed analysis, we'll check the HashMap and see if it should be changed to a ConcurrentHashMap.
Thanks
Hello @tijsrademakers, do you know when you will be able to check this ?
Thanks
Hi,
Can't promise anything yet, but hopefully within the next couple of days
Hi Christoph,
Thanks for providing the test project, that made it really easy to reproduce the issue. We have applied a fix for the concurrent modification issue:
https://github.com/flowable/flowable-engine/commit/eb650424cec2a018b9538856986aa857ca0af01c
Let us know if you encounter any issues.
Hi @tijsrademakers, Thanks for the fix. I applied it on version 6.4.2. It solved the concurrency issue.
Hello, Glad to see that this workaround can now be used without this new hashmap bug. :) But the question remains the same: there's still no true solution allowing to execute (real) parallel branches without optimistic lock exception (and a systematic replay of the whole branch, with all problems it involves...)? And strange that I'm again alone raising this problem when (real, not simulated using the same thread) parallel executions should be part of the basic Flowable features... Thanks for your work in all cases :) Regards
Hi, I have a question about such behavior: if we set the "join" gateway (final <+> of the parallel gateway) as "Async" and "Exclusive", is it only the "join" job which will be replayed in case of Optimistic Lock exception? If it is the case, it's less problematic because we don't replay a task (including action/event), but just the final gateway (storage). Thanks for your help. Regards
BTW, there's a bug at least in the 6.4.2 modeler: when I set both Async and Exclusive flags to the join parallel gateway, then export/import the BPMN, both flags are lost...
Hello,
I come back on this problem because we still have it.
The new example is the following:
I have an inclusive gateway with several branches activated. Each one doesn't finish in a join inclusive gateway (that I could set as ASYNC and EXCLUSIVE to avoid the optimistic locking), but on an End event. And in this case, I retrieve my problem of locking. And there's no way to define a synchronization between branches except by putting an artificial inclusive join GTW to be able to set it as ASYNC and EXCLUSIVE. But for the design of the workflow, it's a pity...
Do you have some tips to avoid this workaround?
Thanks a lot again for your job :)
Bets Regards
Hi there. I’m looking at Flowable reliability and support to evaluate its implementation in a critical system.
I’m worried about finding bugs like this one that have no answers and is still open after a long time. Is there a rationale behind this?