camunda-bpm-platform
camunda-bpm-platform copied to clipboard
Improve performance of DbOperationManager#sortByReferences
Environment (Required on creation)
Standalone java spring boot application using camunda-bpm-spring-boot-starter
Description (Required on creation; please attach any relevant screenshots, stacktraces, log files, etc. to the ticket)
org.camunda.bpm.engine.impl.db.entitymanager.operation.DbOperationManager#sortByReferences uses a nested for-loop which results in an O(n^2) sort complexity. This manifests in operations being very slow when performed on a large dataset. For example, deleting ~100,000 process instances takes more than a day (process had to eventually be terminated). Considering that the method takes in a sorted set of entity-operations which already have info about dependent operations, this method can be improved by using a topological sort instead, which will perform much better with a complexity of O(e+d) where 'e' is the number of entity-operations and 'd' is the number of dependent operations.
In practice, when tried locally on a machine with a decent config, it takes about the same amount of time (~1.5 secs) for deleting 100 instances with either approach, but for 1000 instances it takes around 82 secs with the existing sort algorithm and 18 secs with the approach mentioned above. Further increasing the number of instances to 2000, the times taken are 432 secs and 61 secs.
Added test SortByRefPerfTest to demonstrate the issue.
Note: The example above uses instance deletion as an example but the behaviour would be similar for all other database operations as well, with the severity depending on the number of dependent operations. For example, instances with a large number of variables will be affected more than instances with fewer variables.
Note: This is more of an enhancement idea than a bug report but couldn't find an appropriate category when creating this issue. Also found other similar performance improvements raised as bugs so categorizing this as a bug for now.
Steps to reproduce (Required on creation)
Create ~1000 process instances and delete all of them in a single transaction.
Observed Behavior (Required on creation)
Takes longer than necessary to complete
Expected behavior (Required on creation)
Should complete in a reasonable amount of time based on the number of instances.
Root Cause (Required on prioritization)
Nested for-loop based sorting performed in DbOperationManager#sortByReferences
Solution Ideas
Use graph based topological sort
Hints
Links
Breakdown
Dev2QA handover
Hi @juja0, thanks for opening the issue!
You've provided lots of infos which is great, we will verify your optimization soon. To be honest, I'm not familiar with topological sort right now, so it would require some research from my side. Does your solution still produce the same ordering? Or you would expect some slight changes in the result?
As of the code, I can already recommend to add some more inline comments to make it clear what's happening. Additionally, the deep nesting of for-if-for-if-if..
is not very elegant.
This is not a bug though, I'll change it to a feature.
-Daniel
Does your solution still produce the same ordering
No, it does not necessarily produce the same ordering but the ordering it does produce respects the dependencies between operations. The idea being that, operations with no dependencies are grouped into one and appear first in the final sorted list, followed by the next group of operations that only depend on the first group, followed by the next group which depend on only the second group or both the first and second groups and so on.
add some more inline comments to make it clear what's happening
Added comments
the deep nesting of for-if-for-if-if.. is not very elegant
Sure. The repro was intended to be a quick demo of the issue. It was not intended to be the final submission. Once we've concluded our discussion, I can refactor it.
I'll change it to a feature.
Thanks
Have you verified that the performance problem is caused by the sorting logic and not by other factors (e.g. the database may struggle with transactions that have thousands of statements)?
Yes. The numbers quoted on the original issue report are from the unit test that I've attached which can be executed with and without the suggested optimization to see the difference. In both cases, the database and everything else is exactly the same so the performance difference is due to the sorting logic.
Hi @ThorbenLindhauer , did you get a chance to look at the sample unit test ? do you think it's worth incorporating something like that in the platform ?
Hey @juja0,
my collague Daniel will get back to this. Please bear with us, as we are currently focusing on the 7.20 release due for next month and as this is a not a trivial change, it may take a bit.
Best regards, Thorben
No worries. Thanks for letting me know.
hi. i have a problem with this sortByReferences. The application is hungs with stacktrace. I think improvement performance of DbOperationManager is needed.
"camundaTaskExecutor-1@27837" prio=5 tid=0x15a nid=NA runnable
java.lang.Thread.State: RUNNABLE
at org.camunda.bpm.engine.impl.db.entitymanager.operation.DbOperationManager.sortByReferences(DbOperationManager.java:215)
at org.camunda.bpm.engine.impl.db.entitymanager.operation.DbOperationManager.addSortedInserts(DbOperationManager.java:150)
at org.camunda.bpm.engine.impl.db.entitymanager.operation.DbOperationManager.calculateFlush(DbOperationManager.java:134)
at org.camunda.bpm.engine.impl.db.entitymanager.DbEntityManager.flushDbOperationManager(DbEntityManager.java:305)
at org.camunda.bpm.engine.impl.db.entitymanager.DbEntityManager.flush(DbEntityManager.java:295)
at org.camunda.bpm.engine.impl.interceptor.CommandContext.flushSessions(CommandContext.java:272)
at org.camunda.bpm.engine.impl.interceptor.CommandContext.close(CommandContext.java:188)
at org.camunda.bpm.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:119)
at org.camunda.bpm.engine.spring.SpringTransactionInterceptor.lambda$execute$0(SpringTransactionInterceptor.java:71)
at org.camunda.bpm.engine.spring.SpringTransactionInterceptor$$Lambda$1936/0x0000000801206360.doInTransaction(Unknown Source:-1)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:140)
at org.camunda.bpm.engine.spring.SpringTransactionInterceptor.execute(SpringTransactionInterceptor.java:71)
at org.camunda.bpm.engine.impl.interceptor.ProcessApplicationContextInterceptor.execute(ProcessApplicationContextInterceptor.java:70)
at org.camunda.bpm.engine.impl.interceptor.CommandCounterInterceptor.execute(CommandCounterInterceptor.java:35)
at org.camunda.bpm.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:33)
at org.camunda.bpm.engine.impl.interceptor.ExceptionCodeInterceptor.execute(ExceptionCodeInterceptor.java:55)
at org.camunda.bpm.engine.impl.jobexecutor.ExecuteJobHelper.executeJob(ExecuteJobHelper.java:57)
at org.camunda.bpm.engine.impl.jobexecutor.ExecuteJobsRunnable.executeJob(ExecuteJobsRunnable.java:110)
at org.camunda.bpm.engine.impl.jobexecutor.ExecuteJobsRunnable.run(ExecuteJobsRunnable.java:71)
at org.springframework.cloud.sleuth.instrument.async.TraceRunnable.run(TraceRunnable.java:64)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.lang.Thread.run(Thread.java:833)
Hi @juja0, I finally had some more time to dive into this improvement. Thank you again for taking the time to prepare!
I think I understand the idea behind this and generally it makes sense to me. The test looks indeed faster with your solution, although it's just one example use-case. I noticed that in the test there are no references at all, so it's not really proving if the new logic works properly or not. In fact, it skips a lot of loops and that might justify the speed difference. The original code iterates over the operations in a nested loop regardless whether there are references or not. (This could already be a slight improvement probably.)
This code is very crucial in the engine and right now I definitely don't feel confident enough for this to be merged. So primarily I see two possibilities how we could progress here:
- You could add more tests that actually prove that the ordering of the operations is well respected. We could even compare the output from the two implementations. I know they might be slightly different, but the operations with refs should still be in proper order. I could also run this with our CI just to see if anything obvious fails.
- Maybe we could have a more minimal change that still makes the logic faster? This could be much easier to justify and merge. However, I'm not sure if we can extract just some parts from this logic.
Let me know what you think.
-Daniel
Thanks @danielkelemen . The test that was attached in the OP was just to pick a simple example to demonstrate a potential improvement. It was not intended as a full-blown PR but just to get us on the same page about whether or not there's value in incorporating this change in camunda. I understand the apprehension in making changes in such a crucial component which is why I wanted to start this conversation before jumping into creating a full PR.
For what it's worth, we have this solution running in our production environment for around 2 years and it's been working fine. That version of this solution is a slightly cleaned up version of the one in the attached test and it also has unit tests (including references) to verify correctness.
I'm not sure if we can extract just some parts from this logic.
You're right that we can't just use some parts of this logic so it's got to be all-or-nothing. With that in mind, if you still think this could add value, I can submit a fresh PR (including the unit tests with references that I mentioned before) so we can check if CI passes with the proposed solution.