[Improvement]: Improve stop and start workflow engine behaviour with celery executor

Open tjeerddie opened this issue 5 months ago • 1 comments

Currently when the workflow engine is stopped, it updates the engine global lock in the database and changes the _exec_steps to immediately return process instead of executing any steps. This is fine for threadpool executor since the processes start immediately and when the engine stops, it needs to stop executing the process in the thread. For celery however it doesn't stop the worker after finishing its current process and instead continues untill the celery queue is empty. This keeps all the processes stuck on RUNNING instead of as RESUMED in the queue. We fix the RUNNING processes when the engine is started, by changing their status to RESUMED and resuming them.

Here are steps to improve this behaviour:

[ ] Stop the worker after it stopped its current process.
[ ] in _exec_steps change the process back to RESUMED before stopping the process and re-add it to the queue.
[ ] remove the DB update that changes the process from RUNNING to RESUMED.
- This might not be possible, since processes can get stuck in RUNNING status when a worker forcefully shuts down and the only way to fix these processes is stopping and starting the workflow engine.

_exec_steps snippet that stops the process located here:

        ...
        # Execute step
        try:
            engine_status = get_engine_settings()
            if engine_status.global_lock:
                # Exiting from thread workflow engine is Paused or Pausing
                consolelogger.info(
                    "Not executing Step as the workflow engine is Paused. Process will remain in state 'running'"
                )
                return process 
            ...

Jul 24 '25 09:07 tjeerddie

Stop the worker after it stopped its current process.

Do you mean shut down completely? Wouldn't it be better for it to periodically poll whether the engine is enabled again?

in _exec_steps change the process back to RESUMED before stopping the process and re-add it to the queue.

What's the benefit over just leaving it in the RUNNING state instead?

Aug 04 '25 16:08 Mark90