argo-workflows icon indicating copy to clipboard operation
argo-workflows copied to clipboard

LifeCycle hooks phase not updated at workflow level

Open Piroddi opened this issue 2 years ago • 14 comments

Summary

The phase of LifeCycle hooks configured at a workflow level are not updated once completed. They are stuck in pending state forever.

This prevents workflows from being retried, which negatively impacts long running workflows with lots of steps

What happened/what you expected to happen?

Lifecycle hook phase should change to Succeeded once completed

What version are you running?

v3.3.3

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

Piroddi avatar Apr 28 '22 19:04 Piroddi

it is known bug, in hooks, the workflow will not be reconciled if it is already completed. I will try to come up with fix.

sarabala1979 avatar Apr 29 '22 06:04 sarabala1979

@sarabala1979 this is not a popular bug. I'd like someone from the community to try and fix this, rather than the core team, as our goal should always to make the greatest positive impact for our customers.

Could I ask you to add some guidance for the engineer trying to fix this?

Specifically, link to the exact line of code where you think changes need making.

alexec avatar May 05 '22 19:05 alexec

@sarabala1979 please ignore my last comment. You have done this already with the linked PR.

@Piroddi would you be willing to complete @sarabala1979 PoC PR?

alexec avatar May 05 '22 19:05 alexec

Hi @alexec - when can we expect the changes made in #8586 to get merged. Thanks 😄

sandeepk8s avatar May 23 '22 16:05 sandeepk8s

We are affected by this bug I guess! Using HTTP templates with lifecycle hooks to send our CI workflows build status to bitbucket builds. All the hooks are custom.

hooks:
    failed:
      template: notify
      arguments:
        parameters:
          - name: state
            value: FAILED
          - name: tohash
            value: '{{workflow.parameters.tohash}}'
      expression: workflow.status == "Failed"
    running:
      template: notify
      arguments:
        parameters:
          - name: state
            value: INPROGRESS
          - name: tohash
            value: '{{workflow.parameters.tohash}}'
      expression: workflow.status == "Running"
    success:
      template: notify
      arguments:
        parameters:
          - name: state
            value: SUCCESSFUL
          - name: tohash
            value: '{{workflow.parameters.tohash}}'
      expression: workflow.status == "Succeeded"

We have multiple apps and repos - use only CI workflows pipepine for all. When the wfs take more than 2 minutes - it is affecting and showing the wrong status. Expected Result: image

Actual Result: image

sandeepk8s avatar May 25 '22 19:05 sandeepk8s

@sandeepitachi someone needs to take over #8586. Would you like to?

alexec avatar May 25 '22 20:05 alexec

@alexec I'd love to but I'm not great at coding. 😄 Also not much familiar with Go.

sandeepk8s avatar May 25 '22 23:05 sandeepk8s

Go is VERY easy to learn. It is designed to be. Took me about 3h to learn.

  1. Spend 3h doing http://tour.golang.org
  2. You are now a Go programmer.

alexec avatar May 26 '22 15:05 alexec

Thank you @alexec 😄 Will spend some time this weekend for sure! Would like to contribute!

Also spoke to my architect @brgoode too (he's good at Go) BUT like you - he's super BUSY working with different teams in our org. If he got some time - he's happy to look at it too

sandeepk8s avatar May 26 '22 15:05 sandeepk8s

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

stale[bot] avatar Jun 12 '22 12:06 stale[bot]

I believe this is still a known bug, so I don't think it should be closed.

roofurmston avatar Jun 13 '22 09:06 roofurmston

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

stale[bot] avatar Jul 10 '22 06:07 stale[bot]

I believe this is still a known bug, so I don't think it should be closed. Would be good to know if this is not the case. Otherwise, I think it should stay open.

roofurmston avatar Jul 11 '22 08:07 roofurmston

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

stale[bot] avatar Jul 31 '22 04:07 stale[bot]

This issue has been closed due to inactivity. Feel free to re-open if you still encounter this issue.

stale[bot] avatar Aug 12 '22 13:08 stale[bot]

I'm seeing this issue, and I also notice when bringing up the logs for the hook that the logs UI continuously appends the same logs to the displayed log output every second. Closing the logs window and re-opening it clears the logs and then starts continuously appending the same logs once again.

bencompton avatar Aug 16 '22 21:08 bencompton

Lifecycle hook phase should change to Succeeded once completed

We're seeing the same behavior for workflow.status == "Failed"

jackivanov avatar Aug 17 '22 12:08 jackivanov

This has been affecting our team also. Seems like a fix is in, thanks @sarabala1979. Will it make 3.4?

ezk84 avatar Aug 25 '22 10:08 ezk84

I still have the issue testing in v3.4.0-rc4 where it's supposed to have included the fix. cc: @sarabala1979

hooks with expression workflow.status == "Failed" or workflow.status == "Succeeded" remains with PHASE=Pending despite they are being executed and finished fine.

Should we open a new issue?

example:

spec:
  templates:
    - name: main
      steps:
        - - name: docker-go-test
            templateRef:
              name: docker-go-test
              template: unit-test
  entrypoint: main
  hooks:
    failed:
      templateRef:
        name: notifier
        template: notifier-failed
      expression: workflow.status == "Failed"
    running:
      templateRef:
        name: notifier
        template: notifier-running
      expression: workflow.status == "Running"
    succeeded:
      templateRef:
        name: notifier
        template: notifier-succeeded
      expression: workflow.status == "Succeeded"

image

ese avatar Sep 14 '22 12:09 ese

As workaround you can use exit handlers https://argoproj.github.io/argo-workflows/walk-through/exit-handlers/ instead hooks for failed and succeeded which works fine updating phase after workflow ends: example:

spec:
  onExit: exit-handler
  templates:
    - name: exit-handler
      steps:
        - - name: succeeded
            templateRef:
              name: notifier
              template: notifier-succeed
            when: '{{workflow.status}} == Succeeded'
          - name: failed
            templateRef:
              name: notifier
              template: notifier-failed
            when: '{{workflow.status}} != Succeeded'

ese avatar Sep 16 '22 08:09 ese

As workaround you can use exit handlers https://argoproj.github.io/argo-workflows/walk-through/exit-handlers/ instead hooks for failed and succeeded which works fine updating phase after workflow ends: example:

spec:
  onExit: exit-handler
  templates:
    - name: exit-handler
      steps:
        - - name: succeeded
            templateRef:
              name: notifier
              template: notifier-succeed
            when: '{{workflow.status}} == Succeeded'
          - name: failed
            templateRef:
              name: notifier
              template: notifier-failed
            when: '{{workflow.status}} != Succeeded'

it also do not work.

cheery550 avatar Sep 19 '22 15:09 cheery550

@sarabala1979 I have just upgraded to 3.4.2. However this bug is not resolved. Experiencing the same behaviour

Piroddi avatar Oct 31 '22 08:10 Piroddi

Still an issue. Simple example:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: lifecycle-hook-
spec:
  entrypoint: main
  hooks:
      failure:
        expression: workflow.status == "Failed"
        template: heads
  templates:
    - name: main
      steps:
      - - name: step1
          template: fail
    
    - name: fail
      container:
        image: alpine:3.6
        command: [sh, -c]
        args: ["exit 1"]

    - name: heads
      container:
        image: alpine:3.6
        command: [sh, -c]
        args: ["echo \"running hook\""]

Piroddi avatar Dec 04 '22 18:12 Piroddi

Hi @alexec @sarabala1979 I have time to look into this issue. It's really effecting us, as we cant retry steps in long running workflows.

Should I create a new issue or can you reopen this one?

Piroddi avatar Dec 04 '22 18:12 Piroddi

I believe this is still a known bug, so I don't think it should be closed.

elim19 avatar Feb 15 '23 06:02 elim19

I can confirm is still an issue in v3.4.5

ese avatar Feb 21 '23 14:02 ese

I can confirm as well. v3.4.5

capacman avatar Mar 14 '23 20:03 capacman

This is still an issue in v3.4.8

Mastergalen avatar Jun 25 '23 20:06 Mastergalen

The issue got resolved for me in v3.4.7, from the following fix:

https://github.com/argoproj/argo-workflows/pull/10307

Piroddi avatar Jun 26 '23 07:06 Piroddi

Strange, using the workflow level example I am still seeing the hooks stuck in the Pending state.

image

Workflow YAML

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: lifecycle-hook-
spec:
  entrypoint: main
  hooks:
    exit: # Exit handler
      template: http
    running:
      expression: workflow.status == "Running"
      template: http
  templates:
    - name: main
      steps:
      - - name: step1
          template: heads
    
    - name: heads
      container:
        image: alpine:3.6
        command: [sh, -c]
        args: ["echo \"it was heads\""]
    
    - name: http
      http:
        # url: http://dummy.restapiexample.com/api/v1/employees
        url: "https://raw.githubusercontent.com/argoproj/argo-workflows/4e450e250168e6b4d51a126b784e90b11a0162bc/pkg/apis/workflow/v1alpha1/generated.swagger.json"

Mastergalen avatar Jun 26 '23 08:06 Mastergalen