camunda-bpm-platform icon indicating copy to clipboard operation
camunda-bpm-platform copied to clipboard

On DB2, Update removal time of batch job stacktraces results in transaction timeouts

Open yanavasileva opened this issue 1 year ago • 0 comments

Environment (Required on creation)

  • DB2 Version 11.5.7
  • JDBC Driver 4.31.10_1
  • Camunda 7.18

Description (Required on creation; please attach any relevant screenshots, stacktraces, log files, etc. to the ticket)

Batch operation takes too long to complete Cardinalities:

Query Table Number
select count * from ACT_GE_BYTEARRAY 48 935 974
select count * from ACT_GE_BYTEARRAY WHERE NAME_='job.exceptionByteArray' 3 904 303
select count * from ACT_HI_JOB_LOG 298 948 766
select substr(tabname,1,40), card, stats_time from syscat.tables where tabname in ('ACT_GE_BYTEARRAY', 'ACT_HI_JOB_LOG') ACT_GE_BYTEARRAY 49 000 331
select substr(tabname,1,40), card, stats_time from syscat.tables where tabname in ('ACT_GE_BYTEARRAY', 'ACT_HI_JOB_LOG') ACT_HI_JOB_LOG 300 785 913

Steps to reproduce (Required on creation)

  1. Has cardinalities as described above
  2. Start a batch operation that can have failing jobs that are resolved with retry
  3. Wait for the batch to complete

Observed Behavior (Required on creation)

Transactions timeouts in server log. Batch is taking longer time to complete even though all jobs are completed. After checking the database logs the execution of update ACT_GE_BYTEARRAY set REMOVAL_TIME_ = ? where ID_ IN ( SELECT JOB_EXCEPTION_STACK_ID_ FROM ACT_HI_JOB_LOG WHERE JOB_DEF_CONFIGURATION_ = ? ) is taking long time to complete

Expected behavior (Required on creation)

No transaction timeouts, the query doesn't slow down batch completion.

Root Cause (Required on prioritization)

With given cardinalities, a full table scan is performed on ACT_GE_BYTEARRAY (n=48 935 974) and no index is used. Query plan: https://jira.camunda.com/browse/SUPPORT-15726?focusedCommentId=328543&focusedId=328543&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-328543

Solution Ideas

For DB2 database: Adjust the query to include NAME_ = "job.exceptionByteArray" in the WHERE clause to filter out queried byte arrays and potentially use ACT_IDX_BYTEARRAY_NAME to speed up query execution. Query plan: https://jira.camunda.com/browse/SUPPORT-15726?focusedCommentId=328543&focusedId=328543&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-328543

Total cost original query: 3.11157e+07 Total cost adjusted query: 2.25563e+06 Total cost reduce: 92.75 % faster

  • Alternative solution: implementation of feature https://github.com/camunda/camunda-bpm-platform/issues/2322

Hints

Links

  • https://jira.camunda.com/browse/SUPPORT-15726
    • The history of the ticket is longer. Issue is reported in https://jira.camunda.com/browse/SUPPORT-15726?focusedCommentId=321421&focusedId=321421&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-321421
  • https://github.com/camunda/camunda-bpm-platform/issues/2322

Breakdown

### Pull Requests

Dev2QA handover

  • [ ] Does this ticket need a QA test and the testing goals are not clear from the description? Add a Dev2QA handover comment

yanavasileva avatar Feb 01 '24 08:02 yanavasileva