conductor icon indicating copy to clipboard operation
conductor copied to clipboard

Bug: failureWorkflow marked as FAILED despite completing successfully

Open albert-cg opened this issue 7 months ago • 2 comments

Describe the bug I have a main workflow configured with a failureWorkflow. When the main workflow fails, the specified failureWorkflow is triggered correctly. The failureWorkflow completes with status COMPLETED, but it is still marked as FAILED and shows the error message from the main workflow.
This behavior did not happen before — the failureWorkflow used to be marked as COMPLETED when it finished successfully.

Additionally, this issue occurs intermittently — sometimes the failureWorkflow is correctly marked as COMPLETED, and other times it is incorrectly marked as FAILED.

This causes confusion in monitoring and error handling, since fallback workflows that complete successfully are incorrectly reported as failed.

Details Conductor version: 3.21.14 Persistence implementation: Postgres Queue implementation: Postgres Lock: N/A

To Reproduce Steps to reproduce the behavior:

  1. Create a workflow A that fails.
  2. Configure a failureWorkflow B.
  3. When A fails, B is triggered as expected.
  4. B finishes successfully, but its overall status is sometimes FAILED and shows the error message from A, even though all tasks in B completed.

Expected behavior When a failureWorkflow completes successfully, it should always be marked with status COMPLETED, regardless of the failure reason or message from the original workflow.
The result status of the failureWorkflow should be independent and reflect its actual execution outcome.

Screenshots behaviour issue

Image Image Image Image

Screenshots behaviour expected

Image Image Image Image

albert-cg avatar May 15 '25 14:05 albert-cg

I've executed the swf_utils_notify_error isolated, with all completed tasks but stills appears message:

Failed to invoke HTTP task due to: java.lang.Exception: 401 : [no body]'

Seems that can be a bug in HTTP tasks, that marks no body but task is correctly.

This causes confusion in monitoring and error handling, since fallback workflows that complete successfully are incorrectly reported as failed.

albert-cg avatar May 23 '25 11:05 albert-cg

Hey @albert-cg thanks for bringing this up! I've added this issue to the roadmap and will share with our engineering team.

jeffbulltech avatar Jul 07 '25 16:07 jeffbulltech