camunda-bpm-platform
camunda-bpm-platform copied to clipboard
When deleting a process definition, message event subscription is not removed
This issue was imported from JIRA:
Field | Value |
---|---|
JIRA Link | CAM-12110 |
Reporter | @tasso94 |
Has restricted visibility comments | true |
Environment (Required on creation)
Camunda <= 7.21.
Description (Required on creation; please attach any relevant screenshots, stacktraces, log files, etc. to the ticket)
When deleting the last two process definition versions with a message start event in parallel, the message event subscription of the latest process definition is not removed and orphaned.
Steps to reproduce (Required on creation)
Given
- Thread
t1
deletes process definitionx
in version 1 (x1
) - Thread
t2
deletes process definitionx
in version 2 (x2
) - Thread
t1
already executed the command, so the thread is about to flush and commit the DB changes
Scenario
-
t2
determines the new latest process definition ->x1
is considered to be the new latest process definition -
t2
deletesx2
-
t2
starts to ensure consistency with respect to the new latest process definition ofx1
-
t2
resolve thex1
from the cache -
t2
makes sure that existing event subscriptions get persisted eventually -
t1
flushes and commits -> The process definition got deleted from the database and got removed from the cache -
t2
flushes and commits -> The process definition got deleted from the database and got removed from the cache
Observed Behavior (Required on creation)
- The process definition
x1
got deleted - The process definition
x2
got deleted - But there exists an orphaned event subscription in the table
ACT_RU_EVENT_SUBSCR
pointing tox1
- When somebody now tries to deploy a process with the same message name, then that deployment will fail
Expected behavior (Required on creation)
Depends on the chosen solution:
- Synchronize deletion so that
t2
fails with anOptimisticLockingException
. - Ensure that the event subscription is not restored when the process definition has been deleted.
Root Cause (Required on prioritization)
ACT_RU_EVENT_SUBSCR
rows have no foreign key relation to the process definition.
Solution Ideas
- Event subscription points to process definition with a foreign key relation.
- Like this, an
OptimisticLockingException
is thrown.
- Like this, an
- Only restore the event subscription when the process definition exists.
- Like this, an orphaned event subscription is not restored.
- A redeployment of the process definition fixes the orphaned event subscription.
- The orphaned event subscription is updated with the process definition id and potentially other information.
Hints
Check if this problem exists for the following scenarios as well:
- Signal start event.
- Conditional start event.
- Other event subscriptions than for start events.
- Timer start event (timer declaration).
- Other timer declarations.
Links
- https://jira.camunda.com/browse/SUPPORT-8032
- https://jira.camunda.com/browse/SUPPORT-20072
Breakdown
### Pull Requests
- [x] Failing test case: https://github.com/camunda/camunda-bpm-platform/pull/4065
- [ ] https://github.com/camunda/camunda-bpm-platform/pull/4364
- [ ] https://github.com/camunda/camunda-bpm-platform-maintenance/pull/1227
- [ ] https://github.com/camunda/camunda-bpm-platform-maintenance/pull/1228
- [ ] https://github.com/camunda/camunda-bpm-platform-maintenance/pull/1229
Dev2QA handover
- [ ] Does this ticket need a QA test and the testing goals are not clear from the description? Add a Dev2QA handover comment
Out of the proposed solutions, these are the results:
- Event subscription points to process definition with a foreign key relation: ❌ Not feasible because of backwards compatibility.
- Only restore the event subscription when the process definition exists: ❌ Not doable because when recreating the subscription for the earlier version, the process definition does technically exist (present in the database).
-
A redeployment of the process definition fixes the orphaned event subscription: 👍 Not only it fixes the issue, but this is exactly the behavior we have for
Timer Events
as of today. The orphan timer event job remains in the DB until a new deployment of the process definition. At this point, we deleteobsolete
jobs and create the new ones where applicable.
For this issue, the same behavior was implemented for subscriptions. Whenever we deploy a process definition:
- We remove obsolete jobs.
- We remove obsolete subscriptions. This includes:
- Subscriptions for the previously latest deployed version that need to be removed (normal behavior).
- Conflicting subscriptions for process definitions that are no longer in the DB (orphan subscriptions).
- Create new jobs.
- Create new subscriptions.
Note: For 7.19
only, jobs are not recreated when deleting a process definition version, so we cannot have the issue of orphan jobs. We can however have it for subscriptions, so we'll also be fixing it.
On 7.19 and 7.20, the new tests are ignored for CRDB because the issue never occurs in the first place. This is the case because on concurrent modification, CRDB throws a CrdbTransactionRetryException
(more info here).