argo-workflows
argo-workflows copied to clipboard
Argo workflow doesn't failed for Error node
Pre-requisites
- [X] I have double-checked my configuration
- [X] I can confirm the issues exists when I tested with
:latest
- [X] I'd like to contribute the fix myself (see contributing guide)
What happened/what you expected to happen?
What happened:
Argo workflow status.phase Succeed, when when there is an Error node.
apiVersion: argoproj.io/v1alpha1
kind: Workflow # new type of k8s spec
metadata:
generateName: hello-world- # name of the workflow spec
spec:
entrypoint: test # invoke the whalesay template
templates:
- name: instant-dummy
suspend:
duration: "0.1s"
- name: test # name of the template
dag:
tasks:
- name: A
template: evaluation
arguments:
parameters:
- name: INPUT
value: "abc"
- name: B
depends: A
template: instant-dummy
- name: evaluation
steps:
- - name: placeholder
template: instant-dummy
when: "false"
inputs:
parameters:
- name: INPUT
outputs:
parameters:
- name: OUTPUT
valueFrom:
expression: "inputs.parameters.INPUT.splitList(':')[1]"
securityContext:
runAsNonRoot: true
runAsUser: 1001
serviceAccountName: abc
What Expect: Argo workflow status.phase should always be Error as long as there is an Error node.
apiVersion: argoproj.io/v1alpha1
kind: Workflow # new type of k8s spec
metadata:
generateName: hello-world- # name of the workflow spec
spec:
entrypoint: test # invoke the whalesay template
templates:
- name: instant-dummy
suspend:
duration: "0.1s"
- name: test # name of the template
dag:
tasks:
- name: A
template: evaluation
arguments:
parameters:
- name: INPUT
value: "abc"
- name: evaluation
steps:
- - name: placeholder
template: instant-dummy
when: "false"
inputs:
parameters:
- name: INPUT
outputs:
parameters:
- name: OUTPUT
valueFrom:
expression: "inputs.parameters.INPUT.splitList(':')[1]"
securityContext:
runAsNonRoot: true
runAsUser: 1001
serviceAccountName: abc
Version
v3.4.5 and v3.4.7
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
apiVersion: argoproj.io/v1alpha1
kind: Workflow # new type of k8s spec
metadata:
generateName: hello-world- # name of the workflow spec
spec:
entrypoint: test # invoke the whalesay template
templates:
- name: instant-dummy
suspend:
duration: "0.1s"
- name: test # name of the template
dag:
tasks:
- name: A
template: evaluation
arguments:
parameters:
- name: INPUT
value: "abc"
- name: B
depends: A
template: instant-dummy
- name: evaluation
steps:
- - name: placeholder
template: instant-dummy
when: "false"
inputs:
parameters:
- name: INPUT
outputs:
parameters:
- name: OUTPUT
valueFrom:
expression: "inputs.parameters.INPUT.splitList(':')[1]"
securityContext:
runAsNonRoot: true
runAsUser: 1001
### Logs from the workflow controller
```text
kubectl logs -n argo deploy/workflow-controller | grep ${workflow}
time="2023-04-27T19:40:01.202Z" level=info msg="Processing workflow" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.208Z" level=info msg="Get configmaps 404"
time="2023-04-27T19:40:01.208Z" level=warning msg="Non-transient error: configmaps \"artifact-repositories\" not found"
time="2023-04-27T19:40:01.208Z" level=info msg="resolved artifact repository" artifactRepositoryRef=default-artifact-repository
time="2023-04-27T19:40:01.208Z" level=info msg="Updated phase -> Running" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.208Z" level=info msg="DAG node hello-world-tnq8p initialized Running" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.208Z" level=info msg="All of node hello-world-tnq8p.A dependencies [] completed" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.208Z" level=info msg="Steps node hello-world-tnq8p-648828699 initialized Running" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.208Z" level=info msg="StepGroup node hello-world-tnq8p-2524305911 initialized Running" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.208Z" level=info msg="Skipping hello-world-tnq8p.A[0].placeholder: when 'false' evaluated false" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.208Z" level=info msg="Skipped node hello-world-tnq8p-625697480 initialized Skipped (message: when 'false' evaluated false)" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.208Z" level=info msg="Step group node hello-world-tnq8p-2524305911 successful" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.208Z" level=info msg="node hello-world-tnq8p-2524305911 phase Running -> Succeeded" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.208Z" level=info msg="node hello-world-tnq8p-2524305911 finished: 2023-04-27 19:40:01.208971904 +0000 UTC" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.208Z" level=info msg="Outbound nodes of hello-world-tnq8p-625697480 is [hello-world-tnq8p-625697480]" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="Outbound nodes of hello-world-tnq8p-648828699 is [hello-world-tnq8p-625697480]" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=error msg="Mark error node" error="invalid operation: int(string) (1:25)\n | inputs.parameters.INPUT.splitList(':')[1]\n | ........................^" namespace=wf-fkp-test nodeName=hello-world-tnq8p.A workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="node hello-world-tnq8p-648828699 phase Running -> Error" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="node hello-world-tnq8p-648828699 message: invalid operation: int(string) (1:25)\n | inputs.parameters.INPUT.splitList(':')[1]\n | ........................^" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="node hello-world-tnq8p-648828699 finished: 2023-04-27 19:40:01.20910768 +0000 UTC" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=error msg="Mark error node" error="task 'hello-world-tnq8p.A' errored: invalid operation: int(string) (1:25)\n | inputs.parameters.INPUT.splitList(':')[1]\n | ........................^" namespace=wf-fkp-test nodeName=hello-world-tnq8p.A workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="node hello-world-tnq8p-648828699 message: task 'hello-world-tnq8p.A' errored: invalid operation: int(string) (1:25)\n | inputs.parameters.INPUT.splitList(':')[1]\n | ........................^" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="Skipped node hello-world-tnq8p-665606318 initialized Omitted (message: omitted: depends condition not met)" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="Outbound nodes of hello-world-tnq8p set to [hello-world-tnq8p-665606318]" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="node hello-world-tnq8p phase Running -> Succeeded" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="node hello-world-tnq8p finished: 2023-04-27 19:40:01.209248271 +0000 UTC" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="Checking daemoned children of hello-world-tnq8p" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="TaskSet Reconciliation" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg=reconcileAgentPod namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="Updated phase Running -> Succeeded" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="Marking workflow completed" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="Marking workflow as pending archiving" namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="Checking daemoned children of " namespace=wf-fkp-test workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.209Z" level=info msg="Workflow to be dehydrated" Workflow Size=2520
time="2023-04-27T19:40:01.214Z" level=info msg="Create events 201"
time="2023-04-27T19:40:01.214Z" level=info msg="cleaning up pod" action=deletePod key=wf-fkp-test/hello-world-tnq8p-1340600742-agent/deletePod
time="2023-04-27T19:40:01.219Z" level=info msg="Update workflows 200"
time="2023-04-27T19:40:01.220Z" level=info msg="Create events 201"
time="2023-04-27T19:40:01.220Z" level=info msg="Delete pods 404"
time="2023-04-27T19:40:01.220Z" level=info msg="Workflow update successful" namespace=wf-fkp-test phase=Succeeded resourceVersion=156903209 workflow=hello-world-tnq8p
time="2023-04-27T19:40:01.223Z" level=info msg="DeleteCollection workflowtaskresults 200"
time="2023-04-27T19:40:01.224Z" level=info msg="archiving workflow" namespace=wf-fkp-test uid=a6c171f4-78c9-4d1c-898e-e0de4d181816 workflow=hello-world-tnq8p
### Logs from in your workflow's wait container
```text
N/A
Can you try v3.4.7? This might have been fixed already.
Hi @terrytangyuan, thanks for the quick response. I updated the version in this issue, since I just reproduce this issue in v3.4.7.
We only see this issue on Error node. For the Failed Node, argo workflow can always mark the workflow to Failed phase as long as any DAG Node Failed.
Thanks. I am able to reproduce.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.
Is there any updated? If no, could assign it to me? @agilgur5
Is there any updated?
Any updates would be in the issue. Please see https://sindresorhus.com/blog/issue-bumping & https://justinmayer.com/posts/any-updates/.
If no, could assign it to me? @agilgur5
You don't need to be assigned to work on something, you can open a PR directly.