camunda-bpm-platform
camunda-bpm-platform copied to clipboard
On DB2, Update removal time of batch job stacktraces results in transaction timeouts
Environment (Required on creation)
- DB2 Version 11.5.7
- JDBC Driver 4.31.10_1
- Camunda 7.18
Description (Required on creation; please attach any relevant screenshots, stacktraces, log files, etc. to the ticket)
Batch operation takes too long to complete Cardinalities:
Query | Table | Number |
---|---|---|
select count * from ACT_GE_BYTEARRAY |
48 935 974 | |
select count * from ACT_GE_BYTEARRAY WHERE NAME_='job.exceptionByteArray' |
3 904 303 | |
select count * from ACT_HI_JOB_LOG |
298 948 766 | |
select substr(tabname,1,40), card, stats_time from syscat.tables where tabname in ('ACT_GE_BYTEARRAY', 'ACT_HI_JOB_LOG') |
ACT_GE_BYTEARRAY |
49 000 331 |
select substr(tabname,1,40), card, stats_time from syscat.tables where tabname in ('ACT_GE_BYTEARRAY', 'ACT_HI_JOB_LOG') |
ACT_HI_JOB_LOG |
300 785 913 |
Steps to reproduce (Required on creation)
- Has cardinalities as described above
- Start a batch operation that can have failing jobs that are resolved with retry
- Wait for the batch to complete
Observed Behavior (Required on creation)
Transactions timeouts in server log.
Batch is taking longer time to complete even though all jobs are completed. After checking the database logs the execution of update ACT_GE_BYTEARRAY set REMOVAL_TIME_ = ? where ID_ IN ( SELECT JOB_EXCEPTION_STACK_ID_ FROM ACT_HI_JOB_LOG WHERE JOB_DEF_CONFIGURATION_ = ? )
is taking long time to complete
Expected behavior (Required on creation)
No transaction timeouts, the query doesn't slow down batch completion.
Root Cause (Required on prioritization)
With given cardinalities, a full table scan is performed on ACT_GE_BYTEARRAY
(n=48 935 974
) and no index is used. Query plan:
https://jira.camunda.com/browse/SUPPORT-15726?focusedCommentId=328543&focusedId=328543&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-328543
Solution Ideas
For DB2 database: Adjust the query to include NAME_ = "job.exceptionByteArray"
in the WHERE
clause to filter out queried byte arrays and potentially use ACT_IDX_BYTEARRAY_NAME
to speed up query execution. Query plan:
https://jira.camunda.com/browse/SUPPORT-15726?focusedCommentId=328543&focusedId=328543&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-328543
Total cost original query: 3.11157e+07
Total cost adjusted query: 2.25563e+06
Total cost reduce: 92.75 % faster
- Alternative solution: implementation of feature https://github.com/camunda/camunda-bpm-platform/issues/2322
Hints
Links
- https://jira.camunda.com/browse/SUPPORT-15726
- The history of the ticket is longer. Issue is reported in https://jira.camunda.com/browse/SUPPORT-15726?focusedCommentId=321421&focusedId=321421&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-321421
- https://github.com/camunda/camunda-bpm-platform/issues/2322
Breakdown
### Pull Requests
Dev2QA handover
- [ ] Does this ticket need a QA test and the testing goals are not clear from the description? Add a Dev2QA handover comment