conductor icon indicating copy to clipboard operation
conductor copied to clipboard

Workflow stuck without any errors

Open astelmashenko opened this issue 9 months ago • 3 comments

Describe the bug Conductor does not execute the next task in some workflows. Only after manually re-run workflow it contines execution. Is it possible that due to e.g. connection errors to DB (I see some connection errors in logs) conductor takes task from queue and then does not complete something so workflow is not re-scheduled?

I selected data for the workflow and it does not have task in task in progress table:

select * from queue_message where message_id = '431c7230-9778-483c-b004-9ddadc354822';
         created_on         |         deliver_on         |  queue_name   |              message_id              | priority | popped | offset_time_seconds | payload 
----------------------------+----------------------------+---------------+--------------------------------------+----------+--------+---------------------+---------
 2025-03-31 13:50:05.184872 | 2025-03-31 13:50:35.184872 | _deciderQueue | 6d0c7230-9778-483c-b004-9ddadc3548b9 |        0 | f      |                  30 | 



select * from task_in_progress where workflow_id = '431c7230-9778-483c-b004-9ddadc354822';
 created_on | modified_on | task_def_name | task_id | workflow_id | in_progress_status 
------------+-------------+---------------+---------+-------------+--------------------

Additional context:

SELECT COUNT(*) FROM queue_message WHERE queue_name = '_deciderQueue';
 count 
-------
 24937

If I pause/resume the workflow it gets decided and continues execution.

Details Conductor version: 3.21.12 Persistence implementation: Postgres Queue implementation: Postgres Lock: Redis

To Reproduce Happens from time to time

Expected behavior Workflows tasks do not stuck.

Screenshots Image

Image

astelmashenko avatar Apr 04 '25 11:04 astelmashenko

This has occured for me and @astelmashenko also, quite many times. @v1r3n Is it because of a large data issue? Because initially, it wasn’t happening.

vishal079 avatar Jun 02 '25 07:06 vishal079

@bradyyie @kgoeltner @v1r3n please review

jeffbulltech avatar Jun 06 '25 17:06 jeffbulltech

Please check related PR to the issue https://github.com/conductor-oss/conductor/pull/515

astelmashenko avatar Jun 06 '25 20:06 astelmashenko