conductor icon indicating copy to clipboard operation
conductor copied to clipboard

Task completion event lost

Open ravig-kant opened this issue 1 year ago • 3 comments

Describe the bug We are facing an issue where a conductor task remains in progress. This task executes in a do-while loop along with other tasks. The sequence of tasks in do-while is as follows. UploadPrepare -> Upload_collectItem_Output -> Upload_item_start -> Upload -> Upload_item_end

In the annexed screenshot, for iteration 135, the Upload_item_start__135 is IN_PROGRESS. We have already marked task Upload_item_start__135 as COMPLETED. It triggered the next task of the same iteration i.e. Upload__135. Also, the next task is COMPLETED. This seems like a case of lost updates. Moreover, the workflow is never completed.

Details Conductor version: 3.18 Persistence implementation: Postgres Queue implementation: Dynoqueues Lock: Redis Workflow definition:

Task definition: Event handler definition:

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior The task and the workflow should have been completed.

Screenshots Screenshot 2024-06-04 at 2 07 39 PM

Additional context Add any other context about the problem here.

ravig-kant avatar Jun 04 '24 09:06 ravig-kant

Hi @ravig-kant what database backend are you using?

v1r3n avatar Jun 28 '24 08:06 v1r3n

We are using postgres as backend @v1r3n

ravig-kant avatar Jul 08 '24 14:07 ravig-kant

This is not a race condition within the persistence engine being used, but rather one of the general design. In this example what we have is the task emitting a kafka message, and the response to mark the task as complete comes before the task is marked as in progress. The remaining code on the original thread to mark the task as in progress then executes and moves from complete -> in progress.

This behaviour would be the same with any persistence engine and would only be able to be fixed if the update logic itself had a bit more complexity and logic to handle this case (potentially through conditional updates).

aradu-atlassian avatar Jul 30 '24 00:07 aradu-atlassian

👋 Hi @ravig-kant @aradu-atlassian

We're currently reviewing open issues in the Conductor OSS backlog, and noticed that this issue hasn't been addressed.

To help us keep the backlog focused and actionable, we’d love your input:

  • Is this issue still relevant?
  • Has the problem been resolved in the latest version v3.21.12?
  • Do you have any additional context or updates to provide?

If we don’t hear back in the next 14 days, we’ll assume this issue is no longer active and will close it for housekeeping. Of course, if it's still a valid issue, just let us know and we’ll keep it open!

Thanks for contributing to Conductor OSS! We appreciate your support. 🙌

Jeff Bull

Developer Community Manager | Orkes

DM on Conductor Slack Email me!

jeffbulltech avatar Feb 27 '25 01:02 jeffbulltech

Hi @jeffbulltech

This issue still exists. We had to duplicate the updateTask code in PostgresExecutionDAO and add a conditional status check in the update query, so that it only moves to in-progress from a valid status. However, ideally this should be resolved in oss itself.

nmohan-atl avatar Mar 26 '25 16:03 nmohan-atl