argo-workflows icon indicating copy to clipboard operation
argo-workflows copied to clipboard

When I am using hooks, when I stop this workflow, it does not stop but keeps running.

Open LingClassmate opened this issue 1 year ago • 6 comments

Pre-requisites

  • [X] I have double-checked my configuration
  • [X] I can confirm the issues exists when I tested with :latest
  • [ ] I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

When I am using hooks, when I stop this workflow, it does not stop but keeps running.

Workflow configuration file:

https://github.com/argoproj/argo-workflows/blob/45730a9cdeb588d0e52b1ac87b6e0ca391a95a81/examples/life-cycle-hooks-tmpl-level.yaml

I have stopped, but the state of the hooks has been 'PHASE Pending'

Snipaste_2022-09-21_11-03-20.png

Version

v3.3.5

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
    kind: Workflow
    metadata:
      generateName: lifecycle-hook-tmpl-level-
    spec:
      entrypoint: main
      templates:
        - name: main
          steps:
            - - name: step-1
                hooks:
                  exit:
                    # Expr will not support `-` on variable name. Variable should wrap with `[]`
                    expression: steps["step-1"].status == "Running"
                    template: http
                  success:
                    expression: steps["step-1"].status == "Succeeded"
                    template: http
                template: echo
            - - name: step2
                hooks:
                  exit:
                    expression: steps.step2.status == "Running"
                    template: http
                  success:
                    expression: steps.step2.status == "Succeeded"
                    template: http
                template: echo
        
        - name: echo
          container:
            image: alpine:3.6
            command: [sh, -c]
            args: ["sleep 30 && echo \"it was heads\""]
        
        - name: http
          http:
            # url: http://dummy.restapiexample.com/api/v1/employees
            url: "https://raw.githubusercontent.com/argoproj/argo-workflows/4e450e250168e6b4d51a126b784e90b11a0162bc/pkg/apis/workflow/v1alpha1/generated.swagger.json"

Logs from the workflow controller

[root@k8s-master01 ~]# kubectl logs -n argo deploy/workflow-controller | grep lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:27.089Z" level=info msg="Processing workflow" namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:27.089Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=1 workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:27.089Z" level=info msg="task-result changed" namespace=argo nodeID=lifecycle-hook-tmpl-level-frx69-2640744148 workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:27.092Z" level=info msg="Running OnExit handler" lifeCycleHook="&LifecycleHook{Template:http,Arguments:Arguments{Parameters:[]Parameter{},Artifacts:[]Artifact{},},TemplateRef:nil,Expression:steps["step-1"].status == "Running",}" namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:27.093Z" level=info msg="Workflow step group node lifecycle-hook-tmpl-level-frx69-2601770046 not yet completed" namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:27.093Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:27.093Z" level=info msg=reconcileAgentPod namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:27.120Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=7912549 workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:37.030Z" level=info msg="Processing workflow" namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:37.030Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=1 workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:37.031Z" level=info msg="Running OnExit handler" lifeCycleHook="&LifecycleHook{Template:http,Arguments:Arguments{Parameters:[]Parameter{},Artifacts:[]Artifact{},},TemplateRef:nil,Expression:steps["step-1"].status == "Running",}" namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:37.031Z" level=info msg="Workflow step group node lifecycle-hook-tmpl-level-frx69-2601770046 not yet completed" namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:37.031Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:37.031Z" level=info msg=reconcileAgentPod namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:21:36.997Z" level=info msg="Processing workflow" namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:21:36.997Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=1 workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:21:36.998Z" level=info msg="Running OnExit handler" lifeCycleHook="&LifecycleHook{Template:http,Arguments:Arguments{Parameters:[]Parameter{},Artifacts:[]Artifact{},},TemplateRef:nil,Expression:steps["step-1"].status == "Running",}" namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:21:36.998Z" level=info msg="Workflow step group node lifecycle-hook-tmpl-level-frx69-2601770046 not yet completed" namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:21:36.998Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:21:36.998Z" level=info msg=reconcileAgentPod namespace=argo workflow=lifecycle-hook-tmpl-level-frx69

Logs from in your workflow's wait container

[root@k8s-master01 ~]# kubectl logs -c wait -l workflows.argoproj.io/workflow=lifecycle-hook-tmpl-level-frx69,workflow.argoproj.io/phase!=Succeeded No resources found in default namespace.

LingClassmate avatar Sep 21 '22 03:09 LingClassmate

@LingClassmate Can you try with v3.4.0 or v3.3.9?

sarabala1979 avatar Sep 23 '22 23:09 sarabala1979

@sarabala1979 V3.3.9 still exists when the stop workflow instruction is sent, the workflow never ends, still running state.

Snipaste_2022-09-26_16-20-03.png

V3.4.0 sent the stop workflow instruction which is in the never ending, still running state. It also affects the Workflow-controller container to stop with the following error log.

time="2022-09-26T07:53:44.793Z" level=info msg="Processing workflow" namespace=argo workflow=lifecycle-hook-tmpl-level-z4pm2
time="2022-09-26T07:53:44.793Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=1 workflow=lifecycle-hook-tmpl-level-z4pm2
time="2022-09-26T07:53:44.793Z" level=info msg="task-result changed" namespace=argo nodeID=lifecycle-hook-tmpl-level-z4pm2-1774425000 workflow=lifecycle-hook-tmpl-level-z4pm2
time="2022-09-26T07:53:44.793Z" level=info msg=updateAgentPodStatus namespace=argo workflow=lifecycle-hook-tmpl-level-z4pm2
time="2022-09-26T07:53:44.793Z" level=info msg=assessAgentPodStatus namespace=argo podName=lifecycle-hook-tmpl-level-z4pm2-1340600742-agent
time="2022-09-26T07:53:44.793Z" level=info msg="Terminating pod as part of workflow shutdown" namespace=argo podName=lifecycle-hook-tmpl-level-z4pm2-1340600742-agent shutdownStrategy=Stop workflow=lifecycle-hook-tmpl-level-z4pm2
panic: workflow 'lifecycle-hook-tmpl-level-z4pm2' node '' uninitialized when marking as Failed: [workflow shutdown with strategy:  Stop]
goroutine 397 [running]:
github.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).markNodePhase(0xc0005d2000, {0x0, 0x0}, {0x2150b49, 0x6}, {0xc000d14c10?, 0x1, 0x1})
	/go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:2252 +0xaba
github.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).handleExecutionControlError(0xc0005d2000, {0xc000676180, 0x2a}, 0xc0006680f0, {0xc000676210, 0x26})
	/go/src/github.com/argoproj/argo-workflows/workflow/controller/exec_control.go:78 +0x187
github.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).applyExecutionControl(0xc0005d2000, 0xc000430000, 0x0?)
	/go/src/github.com/argoproj/argo-workflows/workflow/controller/exec_control.go:44 +0x612
github.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).podReconciliation.func2(0x0?)
	/go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:1033 +0x97
created by github.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).podReconciliation
	/go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:1030 +0x215

Snipaste_2022-09-26_15-54-50.png

LingClassmate avatar Sep 26 '22 08:09 LingClassmate

My latest YAML configuration

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: lifecycle-hook-tmpl-level-
spec:
  entrypoint: main
  templates:
    - name: main
      steps:
        - - name: step-1
            hooks:
              exit:
                # Expr will not support `-` on variable name. Variable should wrap with `[]`
                expression: steps["step-1"].status == "Running"
                template: http
              success:
                expression: steps["step-1"].status == "Succeeded"
                template: http
              failed:
                expression: steps["step-1"].status == "Failed"
                template: http
            template: echo
        - - name: step2
            hooks:
              exit:
                expression: steps.step2.status == "Running"
                template: http
              success:
                expression: steps.step2.status == "Succeeded"
                template: http
              failed:
                expression: steps.step2.status == "Failed"
                template: http
            template: echo
    
    - name: echo
      container:
        image: alpine:3.6
        command: [sh, -c]
        args: ["sleep 30 && echo \"it was heads\""]
    
    - name: http
      http:
        # url: http://dummy.restapiexample.com/api/v1/employees
        url: "https://raw.githubusercontent.com/argoproj/argo-workflows/4e450e250168e6b4d51a126b784e90b11a0162bc/pkg/apis/workflow/v1alpha1/generated.swagger.json"

LingClassmate avatar Sep 26 '22 08:09 LingClassmate

I'm running 3.4.0 and I have a similar issue. the lifecyle hook does not start. We are using a sidecar + linkerd injection

jomach avatar Sep 28 '22 15:09 jomach

Another information here. So this workflow does not work: https://github.com/argoproj/argo-workflows/blob/master/examples/life-cycle-hooks-wf-level.yaml

But this one works: https://raw.githubusercontent.com/argoproj/argo-workflows/master/examples/exit-handlers.yaml

It seems that the annotations are not copied from the main pod into the second pod from the lifecyle. Seems to me like a bug on the lifeclyle option. onExit it works

jomach avatar Sep 29 '22 07:09 jomach

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

stale[bot] avatar Oct 15 '22 22:10 stale[bot]