argo-workflows
argo-workflows copied to clipboard
When I am using hooks, when I stop this workflow, it does not stop but keeps running.
Pre-requisites
- [X] I have double-checked my configuration
- [X] I can confirm the issues exists when I tested with
:latest
- [ ] I'd like to contribute the fix myself (see contributing guide)
What happened/what you expected to happen?
When I am using hooks, when I stop this workflow, it does not stop but keeps running.
Workflow configuration file:
https://github.com/argoproj/argo-workflows/blob/45730a9cdeb588d0e52b1ac87b6e0ca391a95a81/examples/life-cycle-hooks-tmpl-level.yaml
I have stopped, but the state of the hooks has been 'PHASE Pending'
Version
v3.3.5
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: lifecycle-hook-tmpl-level-
spec:
entrypoint: main
templates:
- name: main
steps:
- - name: step-1
hooks:
exit:
# Expr will not support `-` on variable name. Variable should wrap with `[]`
expression: steps["step-1"].status == "Running"
template: http
success:
expression: steps["step-1"].status == "Succeeded"
template: http
template: echo
- - name: step2
hooks:
exit:
expression: steps.step2.status == "Running"
template: http
success:
expression: steps.step2.status == "Succeeded"
template: http
template: echo
- name: echo
container:
image: alpine:3.6
command: [sh, -c]
args: ["sleep 30 && echo \"it was heads\""]
- name: http
http:
# url: http://dummy.restapiexample.com/api/v1/employees
url: "https://raw.githubusercontent.com/argoproj/argo-workflows/4e450e250168e6b4d51a126b784e90b11a0162bc/pkg/apis/workflow/v1alpha1/generated.swagger.json"
Logs from the workflow controller
[root@k8s-master01 ~]# kubectl logs -n argo deploy/workflow-controller | grep lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:27.089Z" level=info msg="Processing workflow" namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:27.089Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=1 workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:27.089Z" level=info msg="task-result changed" namespace=argo nodeID=lifecycle-hook-tmpl-level-frx69-2640744148 workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:27.092Z" level=info msg="Running OnExit handler" lifeCycleHook="&LifecycleHook{Template:http,Arguments:Arguments{Parameters:[]Parameter{},Artifacts:[]Artifact{},},TemplateRef:nil,Expression:steps["step-1"].status == "Running",}" namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:27.093Z" level=info msg="Workflow step group node lifecycle-hook-tmpl-level-frx69-2601770046 not yet completed" namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:27.093Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:27.093Z" level=info msg=reconcileAgentPod namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:27.120Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=7912549 workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:37.030Z" level=info msg="Processing workflow" namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:37.030Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=1 workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:37.031Z" level=info msg="Running OnExit handler" lifeCycleHook="&LifecycleHook{Template:http,Arguments:Arguments{Parameters:[]Parameter{},Artifacts:[]Artifact{},},TemplateRef:nil,Expression:steps["step-1"].status == "Running",}" namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:37.031Z" level=info msg="Workflow step group node lifecycle-hook-tmpl-level-frx69-2601770046 not yet completed" namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:37.031Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:01:37.031Z" level=info msg=reconcileAgentPod namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:21:36.997Z" level=info msg="Processing workflow" namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:21:36.997Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=1 workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:21:36.998Z" level=info msg="Running OnExit handler" lifeCycleHook="&LifecycleHook{Template:http,Arguments:Arguments{Parameters:[]Parameter{},Artifacts:[]Artifact{},},TemplateRef:nil,Expression:steps["step-1"].status == "Running",}" namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:21:36.998Z" level=info msg="Workflow step group node lifecycle-hook-tmpl-level-frx69-2601770046 not yet completed" namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:21:36.998Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=lifecycle-hook-tmpl-level-frx69 time="2022-09-21T03:21:36.998Z" level=info msg=reconcileAgentPod namespace=argo workflow=lifecycle-hook-tmpl-level-frx69
Logs from in your workflow's wait container
[root@k8s-master01 ~]# kubectl logs -c wait -l workflows.argoproj.io/workflow=lifecycle-hook-tmpl-level-frx69,workflow.argoproj.io/phase!=Succeeded No resources found in default namespace.
@LingClassmate Can you try with v3.4.0 or v3.3.9?
@sarabala1979 V3.3.9 still exists when the stop workflow instruction is sent, the workflow never ends, still running state.
V3.4.0 sent the stop workflow instruction which is in the never ending, still running state. It also affects the Workflow-controller
container to stop with the following error log.
time="2022-09-26T07:53:44.793Z" level=info msg="Processing workflow" namespace=argo workflow=lifecycle-hook-tmpl-level-z4pm2
time="2022-09-26T07:53:44.793Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=1 workflow=lifecycle-hook-tmpl-level-z4pm2
time="2022-09-26T07:53:44.793Z" level=info msg="task-result changed" namespace=argo nodeID=lifecycle-hook-tmpl-level-z4pm2-1774425000 workflow=lifecycle-hook-tmpl-level-z4pm2
time="2022-09-26T07:53:44.793Z" level=info msg=updateAgentPodStatus namespace=argo workflow=lifecycle-hook-tmpl-level-z4pm2
time="2022-09-26T07:53:44.793Z" level=info msg=assessAgentPodStatus namespace=argo podName=lifecycle-hook-tmpl-level-z4pm2-1340600742-agent
time="2022-09-26T07:53:44.793Z" level=info msg="Terminating pod as part of workflow shutdown" namespace=argo podName=lifecycle-hook-tmpl-level-z4pm2-1340600742-agent shutdownStrategy=Stop workflow=lifecycle-hook-tmpl-level-z4pm2
panic: workflow 'lifecycle-hook-tmpl-level-z4pm2' node '' uninitialized when marking as Failed: [workflow shutdown with strategy: Stop]
goroutine 397 [running]:
github.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).markNodePhase(0xc0005d2000, {0x0, 0x0}, {0x2150b49, 0x6}, {0xc000d14c10?, 0x1, 0x1})
/go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:2252 +0xaba
github.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).handleExecutionControlError(0xc0005d2000, {0xc000676180, 0x2a}, 0xc0006680f0, {0xc000676210, 0x26})
/go/src/github.com/argoproj/argo-workflows/workflow/controller/exec_control.go:78 +0x187
github.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).applyExecutionControl(0xc0005d2000, 0xc000430000, 0x0?)
/go/src/github.com/argoproj/argo-workflows/workflow/controller/exec_control.go:44 +0x612
github.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).podReconciliation.func2(0x0?)
/go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:1033 +0x97
created by github.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).podReconciliation
/go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:1030 +0x215
My latest YAML configuration
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: lifecycle-hook-tmpl-level-
spec:
entrypoint: main
templates:
- name: main
steps:
- - name: step-1
hooks:
exit:
# Expr will not support `-` on variable name. Variable should wrap with `[]`
expression: steps["step-1"].status == "Running"
template: http
success:
expression: steps["step-1"].status == "Succeeded"
template: http
failed:
expression: steps["step-1"].status == "Failed"
template: http
template: echo
- - name: step2
hooks:
exit:
expression: steps.step2.status == "Running"
template: http
success:
expression: steps.step2.status == "Succeeded"
template: http
failed:
expression: steps.step2.status == "Failed"
template: http
template: echo
- name: echo
container:
image: alpine:3.6
command: [sh, -c]
args: ["sleep 30 && echo \"it was heads\""]
- name: http
http:
# url: http://dummy.restapiexample.com/api/v1/employees
url: "https://raw.githubusercontent.com/argoproj/argo-workflows/4e450e250168e6b4d51a126b784e90b11a0162bc/pkg/apis/workflow/v1alpha1/generated.swagger.json"
I'm running 3.4.0 and I have a similar issue. the lifecyle hook does not start. We are using a sidecar + linkerd injection
Another information here. So this workflow does not work: https://github.com/argoproj/argo-workflows/blob/master/examples/life-cycle-hooks-wf-level.yaml
But this one works: https://raw.githubusercontent.com/argoproj/argo-workflows/master/examples/exit-handlers.yaml
It seems that the annotations are not copied from the main pod into the second pod from the lifecyle. Seems to me like a bug on the lifeclyle option. onExit it works
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.