Workflow stuck without any errors
Describe the bug Conductor does not execute the next task in some workflows. Only after manually re-run workflow it contines execution. Is it possible that due to e.g. connection errors to DB (I see some connection errors in logs) conductor takes task from queue and then does not complete something so workflow is not re-scheduled?
I selected data for the workflow and it does not have task in task in progress table:
select * from queue_message where message_id = '431c7230-9778-483c-b004-9ddadc354822';
created_on | deliver_on | queue_name | message_id | priority | popped | offset_time_seconds | payload
----------------------------+----------------------------+---------------+--------------------------------------+----------+--------+---------------------+---------
2025-03-31 13:50:05.184872 | 2025-03-31 13:50:35.184872 | _deciderQueue | 6d0c7230-9778-483c-b004-9ddadc3548b9 | 0 | f | 30 |
select * from task_in_progress where workflow_id = '431c7230-9778-483c-b004-9ddadc354822';
created_on | modified_on | task_def_name | task_id | workflow_id | in_progress_status
------------+-------------+---------------+---------+-------------+--------------------
Additional context:
SELECT COUNT(*) FROM queue_message WHERE queue_name = '_deciderQueue';
count
-------
24937
If I pause/resume the workflow it gets decided and continues execution.
Details Conductor version: 3.21.12 Persistence implementation: Postgres Queue implementation: Postgres Lock: Redis
To Reproduce Happens from time to time
Expected behavior Workflows tasks do not stuck.
Screenshots
This has occured for me and @astelmashenko also, quite many times. @v1r3n Is it because of a large data issue? Because initially, it wasn’t happening.
@bradyyie @kgoeltner @v1r3n please review
Please check related PR to the issue https://github.com/conductor-oss/conductor/pull/515