camunda-bpm-platform icon indicating copy to clipboard operation
camunda-bpm-platform copied to clipboard

When deleting a process definition, message event subscription is not removed

Open ThorbenLindhauer opened this issue 4 years ago • 1 comments

This issue was imported from JIRA:

Field Value
JIRA Link CAM-12110
Reporter @tasso94
Has restricted visibility comments true

Environment (Required on creation)

Camunda <= 7.21.

Description (Required on creation; please attach any relevant screenshots, stacktraces, log files, etc. to the ticket)

When deleting the last two process definition versions with a message start event in parallel, the message event subscription of the latest process definition is not removed and orphaned.

Steps to reproduce (Required on creation)

Given

  • Thread t1 deletes process definition x in version 1 (x1)
  • Thread t2 deletes process definition x in version 2 (x2)
  • Thread t1 already executed the command, so the thread is about to flush and commit the DB changes

Scenario

  1. t2 determines the new latest process definition -> x1 is considered to be the new latest process definition
  2. t2 deletes x2
  3. t2 starts to ensure consistency with respect to the new latest process definition of x1
  4. t2 resolve the x1 from the cache
  5. t2 makes sure that existing event subscriptions get persisted eventually
  6. t1 flushes and commits -> The process definition got deleted from the database and got removed from the cache
  7. t2 flushes and commits -> The process definition got deleted from the database and got removed from the cache

Observed Behavior (Required on creation)

  • The process definition x1 got deleted
  • The process definition x2 got deleted
  • But there exists an orphaned event subscription in the table ACT_RU_EVENT_SUBSCR pointing to x1
  • When somebody now tries to deploy a process with the same message name, then that deployment will fail

Expected behavior (Required on creation)

Depends on the chosen solution:

  • Synchronize deletion so that t2 fails with an OptimisticLockingException.
  • Ensure that the event subscription is not restored when the process definition has been deleted.

Root Cause (Required on prioritization)

ACT_RU_EVENT_SUBSCR rows have no foreign key relation to the process definition.

Solution Ideas

  1. Event subscription points to process definition with a foreign key relation.
    • Like this, an OptimisticLockingException is thrown.
  2. Only restore the event subscription when the process definition exists.
    • Like this, an orphaned event subscription is not restored.
  3. A redeployment of the process definition fixes the orphaned event subscription.
    • The orphaned event subscription is updated with the process definition id and potentially other information.

Hints

Check if this problem exists for the following scenarios as well:

  • Signal start event.
  • Conditional start event.
  • Other event subscriptions than for start events.
  • Timer start event (timer declaration).
  • Other timer declarations.

Links

  • https://jira.camunda.com/browse/SUPPORT-8032
  • https://jira.camunda.com/browse/SUPPORT-20072

Breakdown

### Pull Requests
- [x] Failing test case: https://github.com/camunda/camunda-bpm-platform/pull/4065
- [ ] https://github.com/camunda/camunda-bpm-platform/pull/4364
- [ ] https://github.com/camunda/camunda-bpm-platform-maintenance/pull/1227
- [ ] https://github.com/camunda/camunda-bpm-platform-maintenance/pull/1228
- [ ] https://github.com/camunda/camunda-bpm-platform-maintenance/pull/1229

Dev2QA handover

  • [ ] Does this ticket need a QA test and the testing goals are not clear from the description? Add a Dev2QA handover comment

ThorbenLindhauer avatar Jun 29 '20 15:06 ThorbenLindhauer

Out of the proposed solutions, these are the results:

  1. Event subscription points to process definition with a foreign key relation: ❌ Not feasible because of backwards compatibility.
  2. Only restore the event subscription when the process definition exists: ❌ Not doable because when recreating the subscription for the earlier version, the process definition does technically exist (present in the database).
  3. A redeployment of the process definition fixes the orphaned event subscription: 👍 Not only it fixes the issue, but this is exactly the behavior we have for Timer Events as of today. The orphan timer event job remains in the DB until a new deployment of the process definition. At this point, we delete obsolete jobs and create the new ones where applicable.

For this issue, the same behavior was implemented for subscriptions. Whenever we deploy a process definition:

  • We remove obsolete jobs.
  • We remove obsolete subscriptions. This includes:
    • Subscriptions for the previously latest deployed version that need to be removed (normal behavior).
    • Conflicting subscriptions for process definitions that are no longer in the DB (orphan subscriptions).
  • Create new jobs.
  • Create new subscriptions.

Note: For 7.19 only, jobs are not recreated when deleting a process definition version, so we cannot have the issue of orphan jobs. We can however have it for subscriptions, so we'll also be fixing it.

joaquinfelici avatar May 24 '24 08:05 joaquinfelici

On 7.19 and 7.20, the new tests are ignored for CRDB because the issue never occurs in the first place. This is the case because on concurrent modification, CRDB throws a CrdbTransactionRetryException (more info here).

joaquinfelici avatar May 28 '24 10:05 joaquinfelici