HTTP Template Step Manual Retry Hangs After First Retry Failure
Pre-requisites
- [X] I have double-checked my configuration
- [X] I can confirm the issues exists when I tested with
:latest - [x] I'd like to contribute the fix myself (see contributing guide)
What happened/what you expected to happen?
HTTP Template Step Manual Retry Hangs After First Retry Failure
When manually retrying failed argo step workflows, if the step fails once again the workflow hangs. Regardless of the configured retry strategy. Is the global onExit hook causing the issue? We ideally want to be able to retry the complete strategy just in case of failure in the manual retry.
Details
Pod Failure Message
step group deemed errored due to child nss-test-failure-test-pggwg[0].failData error: no Node found by the name of ; wf.Status.Nodes=map[nss-test-failure-test-pggwg:{ID:nss-test-failure-test-pggwg Name:nss-test-failure-test-pggwg DisplayName:nss-test-failure-test-pggwg Type:Steps TemplateName:start-test-fail TemplateRef:nil TemplateScope:local/ Phase:Running BoundaryID: Message: StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned:Version
v3.4.5
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
metadata:
name: test-failure
generateName: test-failure-
namespace: argo
spec:
templates:
- name: start-test-fail
steps:
- - name: failData
template: get-fail-page
arguments:
parameters:
- name: url
value: https://google.com/page503
- name: httpmethod
value: POST
- name: data
value: '{ "testVal": "testVal" }'
- name: get-fail-page
inputs:
parameters:
- name: url
value: '{{inputs.parameters.url}}'
- name: httpmethod
value: '{{inputs.parameters.httpmethod}}'
- name: data
value: '{{inputs.parameters.body}}'
- name: success
value: '{{inputs.parameters.success}}'
http:
method: '{{inputs.parameters.httpmethod}}'
url: '{{inputs.parameters.url}}'
headers:
- name: accept
value: application/json
timeoutSeconds: 120
body: '{{inputs.parameters.data}}'
retryStrategy:
limit: '2'
retryPolicy: Always
entrypoint: start-test-fail
Logs from the workflow controller
kubectl logs -n argo deploy/workflow-controller | grep ${workflow}
time="2023-09-26T19:51:26.325Z" level=info msg=reconcileAgentPod namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:26.350Z" level=info msg="Workflow update successful" namespace=new-store-setup phase=Running resourceVersion=385865971 workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.367Z" level=info msg="Processing workflow" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.367Z" level=info msg="Task-result reconciliation" namespace=new-store-setup numObjs=0 workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.367Z" level=info msg=updateAgentPodStatus namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.367Z" level=info msg=assessAgentPodStatus namespace=new-store-setup podName=nss-test-failure-test-pggwg-1340600742-agent
time="2023-09-26T19:51:36.368Z" level=info msg="No more retries left. Failing..." namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.368Z" level=info msg="node nss-test-failure-test-pggwg-3009483189 phase Running -> Failed" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.368Z" level=info msg="node nss-test-failure-test-pggwg-3009483189 message: No more retries left" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.368Z" level=info msg="node nss-test-failure-test-pggwg-3009483189 finished: 2023-09-26 19:51:36.36847328 +0000 UTC" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.368Z" level=info msg="Step group node nss-test-failure-test-pggwg-1321674353 deemed failed: child 'nss-test-failure-test-pggwg-3009483189' failed" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.368Z" level=info msg="node nss-test-failure-test-pggwg-1321674353 phase Running -> Failed" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.368Z" level=info msg="node nss-test-failure-test-pggwg-1321674353 message: child 'nss-test-failure-test-pggwg-3009483189' failed" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.368Z" level=info msg="node nss-test-failure-test-pggwg-1321674353 finished: 2023-09-26 19:51:36.368547381 +0000 UTC" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.368Z" level=info msg="step group nss-test-failure-test-pggwg-1321674353 was unsuccessful: child 'nss-test-failure-test-pggwg-3009483189' failed" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.368Z" level=info msg="Outbound nodes of nss-test-failure-test-pggwg-3009483189 is [nss-test-failure-test-pggwg-2963437778]" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.368Z" level=info msg="Outbound nodes of nss-test-failure-test-pggwg is [nss-test-failure-test-pggwg-2963437778]" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.368Z" level=info msg="node nss-test-failure-test-pggwg phase Running -> Failed" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.368Z" level=info msg="node nss-test-failure-test-pggwg message: child 'nss-test-failure-test-pggwg-3009483189' failed" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.368Z" level=info msg="node nss-test-failure-test-pggwg finished: 2023-09-26 19:51:36.368642282 +0000 UTC" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.368Z" level=info msg="Checking daemoned children of nss-test-failure-test-pggwg" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.368Z" level=info msg="TaskSet Reconciliation" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.368Z" level=info msg=reconcileAgentPod namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.368Z" level=info msg="Running OnExit handler: exit-handler" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.369Z" level=info msg="Steps node nss-test-failure-test-pggwg-3792583316 initialized Running" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.369Z" level=info msg="StepGroup node nss-test-failure-test-pggwg-3094081150 initialized Running" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.369Z" level=info msg="Pod node nss-test-failure-test-pggwg-881966844 initialized Pending" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.425Z" level=info msg="Created pod: nss-test-failure-test-pggwg.onExit[0].notify-slack (nss-test-failure-test-pggwg-notify-slack-881966844)" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.425Z" level=info msg="Pod node nss-test-failure-test-pggwg-2150953069 initialized Pending" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.464Z" level=info msg="Created pod: nss-test-failure-test-pggwg.onExit[0].send-alert (nss-test-failure-test-pggwg-send-alert-2150953069)" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.464Z" level=info msg="Workflow step group node nss-test-failure-test-pggwg-3094081150 not yet completed" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:36.491Z" level=info msg="Workflow update successful" namespace=new-store-setup phase=Running resourceVersion=385866288 workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:46.427Z" level=info msg="Processing workflow" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:46.427Z" level=info msg="Task-result reconciliation" namespace=new-store-setup numObjs=0 workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:46.427Z" level=info msg=updateAgentPodStatus namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:46.427Z" level=info msg=assessAgentPodStatus namespace=new-store-setup podName=nss-test-failure-test-pggwg-1340600742-agent
time="2023-09-26T19:51:46.427Z" level=info msg="node changed" namespace=new-store-setup new.message= new.phase=Succeeded new.progress=0/1 nodeID=nss-test-failure-test-pggwg-2150953069 old.message= old.phase=Pending old.progress=0/1 workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:46.427Z" level=info msg="node changed" namespace=new-store-setup new.message= new.phase=Succeeded new.progress=0/1 nodeID=nss-test-failure-test-pggwg-881966844 old.message= old.phase=Pending old.progress=0/1 workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:46.428Z" level=info msg="TaskSet Reconciliation" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:46.428Z" level=info msg=reconcileAgentPod namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:46.428Z" level=info msg="Running OnExit handler: exit-handler" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:46.428Z" level=info msg="Step group node nss-test-failure-test-pggwg-3094081150 successful" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:46.428Z" level=info msg="node nss-test-failure-test-pggwg-3094081150 phase Running -> Succeeded" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:46.428Z" level=info msg="node nss-test-failure-test-pggwg-3094081150 finished: 2023-09-26 19:51:46.42883277 +0000 UTC" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:46.428Z" level=info msg="Outbound nodes of nss-test-failure-test-pggwg-881966844 is [nss-test-failure-test-pggwg-881966844]" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:46.428Z" level=info msg="Outbound nodes of nss-test-failure-test-pggwg-2150953069 is [nss-test-failure-test-pggwg-2150953069]" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:46.428Z" level=info msg="Outbound nodes of nss-test-failure-test-pggwg-3792583316 is [nss-test-failure-test-pggwg-881966844 nss-test-failure-test-pggwg-2150953069]" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:46.428Z" level=info msg="node nss-test-failure-test-pggwg-3792583316 phase Running -> Succeeded" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:46.428Z" level=info msg="node nss-test-failure-test-pggwg-3792583316 finished: 2023-09-26 19:51:46.428980172 +0000 UTC" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:46.428Z" level=info msg="Checking daemoned children of nss-test-failure-test-pggwg-3792583316" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:46.429Z" level=info msg="Updated phase Running -> Failed" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:46.429Z" level=info msg="Updated message -> child 'nss-test-failure-test-pggwg-3009483189' failed" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:46.429Z" level=info msg="Marking workflow completed" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:46.429Z" level=info msg="Checking daemoned children of " namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:46.434Z" level=info msg="cleaning up pod" action=deletePod key=new-store-setup/nss-test-failure-test-pggwg-1340600742-agent/deletePod
time="2023-09-26T19:51:46.456Z" level=info msg="Workflow update successful" namespace=new-store-setup phase=Failed resourceVersion=385866549 workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:51:46.458Z" level=info msg="Queueing Failed workflow new-store-setup/nss-test-failure-test-pggwg for delete in 120h0m0s due to TTL"
time="2023-09-26T19:51:46.478Z" level=info msg="cleaning up pod" action=labelPodCompleted key=new-store-setup/nss-test-failure-test-pggwg-notify-slack-881966844/labelPodCompleted
time="2023-09-26T19:51:46.478Z" level=info msg="cleaning up pod" action=labelPodCompleted key=new-store-setup/nss-test-failure-test-pggwg-send-alert-2150953069/labelPodCompleted
time="2023-09-26T19:52:14.991Z" level=info msg="Processing workflow" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:14.991Z" level=info msg="Task-result reconciliation" namespace=new-store-setup numObjs=0 workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:14.991Z" level=info msg="Retry node nss-test-failure-test-pggwg-3009483189 initialized Running" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:14.991Z" level=info msg="HTTP node nss-test-failure-test-pggwg-3433505300 initialized Pending" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:14.991Z" level=info msg="Workflow step group node nss-test-failure-test-pggwg-1321674353 not yet completed" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:14.991Z" level=info msg="TaskSet Reconciliation" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:14.991Z" level=info msg="Creating TaskSet" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:15.022Z" level=info msg=reconcileAgentPod namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:15.064Z" level=info msg="Created Agent pod" namespace=new-store-setup podName=nss-test-failure-test-pggwg-1340600742-agent workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:15.064Z" level=info msg=updateAgentPodStatus namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:15.064Z" level=info msg=assessAgentPodStatus namespace=new-store-setup podName=nss-test-failure-test-pggwg-1340600742-agent
time="2023-09-26T19:52:15.089Z" level=info msg="Workflow update successful" namespace=new-store-setup phase=Running resourceVersion=385867330 workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.023Z" level=info msg="Processing workflow" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.024Z" level=info msg="Task-result reconciliation" namespace=new-store-setup numObjs=0 workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.024Z" level=info msg=updateAgentPodStatus namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.024Z" level=info msg=assessAgentPodStatus namespace=new-store-setup podName=nss-test-failure-test-pggwg-1340600742-agent
time="2023-09-26T19:52:25.024Z" level=info msg="1 child nodes of nss-test-failure-test-pggwg[0].failData failed. Trying again..." namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.025Z" level=error msg="no Node found by the name of ; wf.Status.Nodes=map[nss-test-failure-test-pggwg:{ID:nss-test-failure-test-pggwg Name:nss-test-failure-test-pggwg DisplayName:nss-test-failure-test-pggwg Type:Steps TemplateName:start-test-fail TemplateRef:nil TemplateScope:local/ Phase:Running BoundaryID: Message: StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[nss-test-failure-test-pggwg-1321674353] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-1321674353:{ID:nss-test-failure-test-pggwg-1321674353 Name:nss-test-failure-test-pggwg[0] DisplayName:[0] Type:StepGroup TemplateName: TemplateRef:nil TemplateScope:local/ Phase:Running BoundaryID:nss-test-failure-test-pggwg Message: StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[nss-test-failure-test-pggwg-3009483189] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-2829363921:{ID: Name: DisplayName: Type: TemplateName: TemplateRef:nil TemplateScope: Phase:Failed BoundaryID: Message:context canceled StartedAt:0001-01-01 00:00:00 +0000 UTC FinishedAt:2023-09-26 19:52:14 +0000 UTC EstimatedDuration:0 Progress: ResourcesDuration: PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-2963437778:{ID: Name: DisplayName: Type: TemplateName: TemplateRef:nil TemplateScope: Phase:Failed BoundaryID: Message:context canceled StartedAt:0001-01-01 00:00:00 +0000 UTC FinishedAt:2023-09-26 19:52:14 +0000 UTC EstimatedDuration:0 Progress: ResourcesDuration: PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-3009483189:{ID:nss-test-failure-test-pggwg-3009483189 Name:nss-test-failure-test-pggwg[0].failData DisplayName:failData Type:Retry TemplateName:get-fail-page TemplateRef:nil TemplateScope:local/ Phase:Running BoundaryID:nss-test-failure-test-pggwg Message: StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned:<nil> Inputs:&Inputs{Parameters:[]Parameter{Parameter{Name:url,Default:nil,Value:*https://google.com/page503,ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},Parameter{Name:httpmethod,Default:nil,Value:*POST,ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},Parameter{Name:success,Default:nil,Value:*{{inputs.parameters.success}},ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},},Artifacts:[]Artifact{},} Outputs:nil Children:[nss-test-failure-test-pggwg-3433505300 nss-test-failure-test-pggwg-2829363921] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-3433505300:{ID:nss-test-failure-test-pggwg-3433505300 Name:nss-test-failure-test-pggwg[0].failData(0) DisplayName:failData(0) Type:HTTP TemplateName:get-fail-page TemplateRef:nil TemplateScope:local/ Phase:Failed BoundaryID:nss-test-failure-test-pggwg Message:received non-2xx response code: 404 StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:2023-09-26 19:52:14 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned:<nil> Inputs:&Inputs{Parameters:[]Parameter{Parameter{Name:url,Default:nil,Value:*https://google.com/page503,ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},Parameter{Name:httpmethod,Default:nil,Value:*POST,ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},Parameter{Name:success,Default:nil,Value:*{{inputs.parameters.success}},ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},},Artifacts:[]Artifact{},} Outputs:&Outputs{Parameters:[]Parameter{},Artifacts:[]Artifact{},Result:*<!DOCTYPE html>\n<html lang=en>\n <meta charset=utf-8>\n <meta name=viewport content=\"initial-scale=1, minimum-scale=1, width=device-width\">\n <title>Error 404 (Not Found)!!1</title>\n <style>\n *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\n </style>\n <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n <p><b>404.</b> <ins>That’s an error.</ins>\n <p>The requested URL <code>/page503</code> was not found on this server. <ins>That’s all we know.</ins>\n,ExitCode:nil,} Children:[] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-3792583316:{ID:nss-test-failure-test-pggwg-3792583316 Name:nss-test-failure-test-pggwg.onExit DisplayName:nss-test-failure-test-pggwg.onExit Type:Steps TemplateName:exit-handler TemplateRef:nil TemplateScope:local/ Phase:Running BoundaryID: Message: StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:2/2 ResourcesDuration:7s*(100Mi memory),7s*(1 cpu) PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[nss-test-failure-test-pggwg-3094081150] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil}]" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.025Z" level=info msg="Updated phase Running -> Error" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.025Z" level=info msg="Updated message -> no Node found by the name of ; wf.Status.Nodes=map[nss-test-failure-test-pggwg:{ID:nss-test-failure-test-pggwg Name:nss-test-failure-test-pggwg DisplayName:nss-test-failure-test-pggwg Type:Steps TemplateName:start-test-fail TemplateRef:nil TemplateScope:local/ Phase:Running BoundaryID: Message: StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[nss-test-failure-test-pggwg-1321674353] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-1321674353:{ID:nss-test-failure-test-pggwg-1321674353 Name:nss-test-failure-test-pggwg[0] DisplayName:[0] Type:StepGroup TemplateName: TemplateRef:nil TemplateScope:local/ Phase:Running BoundaryID:nss-test-failure-test-pggwg Message: StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[nss-test-failure-test-pggwg-3009483189] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-2829363921:{ID: Name: DisplayName: Type: TemplateName: TemplateRef:nil TemplateScope: Phase:Failed BoundaryID: Message:context canceled StartedAt:0001-01-01 00:00:00 +0000 UTC FinishedAt:2023-09-26 19:52:14 +0000 UTC EstimatedDuration:0 Progress: ResourcesDuration: PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-2963437778:{ID: Name: DisplayName: Type: TemplateName: TemplateRef:nil TemplateScope: Phase:Failed BoundaryID: Message:context canceled StartedAt:0001-01-01 00:00:00 +0000 UTC FinishedAt:2023-09-26 19:52:14 +0000 UTC EstimatedDuration:0 Progress: ResourcesDuration: PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-3009483189:{ID:nss-test-failure-test-pggwg-3009483189 Name:nss-test-failure-test-pggwg[0].failData DisplayName:failData Type:Retry TemplateName:get-fail-page TemplateRef:nil TemplateScope:local/ Phase:Running BoundaryID:nss-test-failure-test-pggwg Message: StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned:<nil> Inputs:&Inputs{Parameters:[]Parameter{Parameter{Name:url,Default:nil,Value:*https://google.com/page503,ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},Parameter{Name:httpmethod,Default:nil,Value:*POST,ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},Parameter{Name:success,Default:nil,Value:*{{inputs.parameters.success}},ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},},Artifacts:[]Artifact{},} Outputs:nil Children:[nss-test-failure-test-pggwg-3433505300 nss-test-failure-test-pggwg-2829363921] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-3433505300:{ID:nss-test-failure-test-pggwg-3433505300 Name:nss-test-failure-test-pggwg[0].failData(0) DisplayName:failData(0) Type:HTTP TemplateName:get-fail-page TemplateRef:nil TemplateScope:local/ Phase:Failed BoundaryID:nss-test-failure-test-pggwg Message:received non-2xx response code: 404 StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:2023-09-26 19:52:14 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned:<nil> Inputs:&Inputs{Parameters:[]Parameter{Parameter{Name:url,Default:nil,Value:*https://google.com/page503,ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},Parameter{Name:httpmethod,Default:nil,Value:*POST,ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},Parameter{Name:success,Default:nil,Value:*{{inputs.parameters.success}},ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},},Artifacts:[]Artifact{},} Outputs:&Outputs{Parameters:[]Parameter{},Artifacts:[]Artifact{},Result:*<!DOCTYPE html>\n<html lang=en>\n <meta charset=utf-8>\n <meta name=viewport content=\"initial-scale=1, minimum-scale=1, width=device-width\">\n <title>Error 404 (Not Found)!!1</title>\n <style>\n *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\n </style>\n <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n <p><b>404.</b> <ins>That’s an error.</ins>\n <p>The requested URL <code>/page503</code> was not found on this server. <ins>That’s all we know.</ins>\n,ExitCode:nil,} Children:[] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-3792583316:{ID:nss-test-failure-test-pggwg-3792583316 Name:nss-test-failure-test-pggwg.onExit DisplayName:nss-test-failure-test-pggwg.onExit Type:Steps TemplateName:exit-handler TemplateRef:nil TemplateScope:local/ Phase:Running BoundaryID: Message: StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:2/2 ResourcesDuration:7s*(100Mi memory),7s*(1 cpu) PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[nss-test-failure-test-pggwg-3094081150] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil}]" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.025Z" level=info msg="Marking workflow completed" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.025Z" level=error msg="Mark error node" error="step group deemed errored due to child nss-test-failure-test-pggwg[0].failData error: no Node found by the name of ; wf.Status.Nodes=map[nss-test-failure-test-pggwg:{ID:nss-test-failure-test-pggwg Name:nss-test-failure-test-pggwg DisplayName:nss-test-failure-test-pggwg Type:Steps TemplateName:start-test-fail TemplateRef:nil TemplateScope:local/ Phase:Running BoundaryID: Message: StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[nss-test-failure-test-pggwg-1321674353] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-1321674353:{ID:nss-test-failure-test-pggwg-1321674353 Name:nss-test-failure-test-pggwg[0] DisplayName:[0] Type:StepGroup TemplateName: TemplateRef:nil TemplateScope:local/ Phase:Running BoundaryID:nss-test-failure-test-pggwg Message: StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[nss-test-failure-test-pggwg-3009483189] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-2829363921:{ID: Name: DisplayName: Type: TemplateName: TemplateRef:nil TemplateScope: Phase:Failed BoundaryID: Message:context canceled StartedAt:0001-01-01 00:00:00 +0000 UTC FinishedAt:2023-09-26 19:52:14 +0000 UTC EstimatedDuration:0 Progress: ResourcesDuration: PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-2963437778:{ID: Name: DisplayName: Type: TemplateName: TemplateRef:nil TemplateScope: Phase:Failed BoundaryID: Message:context canceled StartedAt:0001-01-01 00:00:00 +0000 UTC FinishedAt:2023-09-26 19:52:14 +0000 UTC EstimatedDuration:0 Progress: ResourcesDuration: PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggw-3009483189:{ID:nss-test-failure-test-pggwg-3009483189 Name:nss-test-failure-test-pggwg[0].failData DisplayName:failData Type:Retry TemplateName:get-fail-page TemplateRef:nil TemplateScope:local/ Phase:Running BoundaryID:nss-test-failure-test-pggwg Message: StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned:<nil> Inputs:&Inputs{Parameters:[]Parameter{Parameter{Name:url,Default:nil,Value:*https://google.com/page503,ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},Parameter{Name:httpmethod,Default:nil,Value:*POST,ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},Parameter{Name:success,Default:nil,Value:*{{inputs.parameters.success}},ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},},Artifacts:[]Artifact{},} Outputs:nil Children:[nss-test-failure-test-pggwg-3433505300 nss-test-failure-test-pggwg-2829363921] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-3433505300:{ID:nss-test-failure-test-pggwg-3433505300 Name:nss-test-failure-test-pggwg[0].failData(0) DisplayName:failData(0) Type:HTTP TemplateName:get-fail-page TemplateRef:nil TemplateScope:local/ Phase:Failed BoundaryID:nss-test-failure-test-pggwg Message:received non-2xx response code: 404 StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:2023-09-26 19:52:14 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned:<nil> Inputs:&Inputs{Parameters:[]Parameter{Parameter{Name:url,Default:nil,Value:*https://google.com/page503,ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},Parameter{Name:httpmethod,Default:nil,Value:*POST,ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},Parameter{Name:success,Default:nil,Value:*{{inputs.parameters.success}},ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},},Artifacts:[]Artifact{},} Outputs:&Outputs{Parameters:[]Parameter{},Artifacts:[]Artifact{},Result:*<!DOCTYPE html>\n<html lang=en>\n <meta charset=utf-8>\n <meta name=viewport content=\"initial-scale=1, minimum-scale=1, width=device-width\">\n <title>Error 404 (Not Found)!!1</title>\n <style>\n *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\n </style>\n <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n <p><b>404.</b> <ins>That’s an error.</ins>\n <p>The requested URL <code>/page503</code> was not found on this server. <ins>That’s all we know.</ins>\n,ExitCode:nil,} Children:[] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-3792583316:{ID:nss-test-failure-test-pggwg-3792583316 Name:nss-test-failure-test-pggwg.onExit DisplayName:nss-test-failure-test-pggwg.onExit Type:Steps TemplateName:exit-handler TemplateRef:nil TemplateScope:local/ Phase:Running BoundaryID: Message: StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:2/2 ResourcesDuration:7s*(100Mi memory),7s*(1 cpu) PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[nss-test-failure-test-pggwg-3094081150] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil}]" namespace=new-store-setup nodeName="nss-test-failure-test-pggwg[0]" workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.025Z" level=info msg="node nss-test-failure-test-pggwg-1321674353 phase Running -> Error" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.025Z" level=info msg="node nss-test-failure-test-pggwg-1321674353 message: step group deemed errored due to child nss-test-failure-test-pggwg[0].failData error: no Node found by the name of ; wf.Status.Nodes=map[nss-test-failure-test-pggwg:{ID:nss-test-failure-test-pggwg Name:nss-test-failure-test-pggwg DisplayName:nss-test-failure-test-pggwg Type:Steps TemplateName:start-test-fail TemplateRef:nil TemplateScope:local/ Phase:Running BoundaryID: Message: StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[nss-test-failure-test-pggwg-1321674353] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-1321674353:{ID:nss-test-failure-test-pggwg-1321674353 Name:nss-test-failure-test-pggwg[0] DisplayName:[0] Type:StepGroup TemplateName: TemplateRef:nil TemplateScope:local/ Phase:Running BoundaryID:nss-test-failure-test-pggwg Message: StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[nss-test-failure-test-pggwg-3009483189] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-2829363921:{ID: Name: DisplayName: Type: TemplateName: TemplateRef:nil TemplateScope: Phase:Failed BoundaryID: Message:context canceled StartedAt:0001-01-01 00:00:00 +0000 UTC FinishedAt:2023-09-26 19:52:14 +0000 UTC EstimatedDuration:0 Progress: ResourcesDuration: PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-2963437778:{ID: Name: DisplayName: Type: TemplateName: TemplateRef:nil TemplateScope: Phase:Failed BoundaryID: Message:context canceled StartedAt:0001-01-01 00:00:00 +0000 UTC FinishedAt:2023-09-26 19:52:14 +0000 UTC EstimatedDuration:0 Progress: ResourcesDuration: PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-3009483189:{ID:nss-test-failure-test-pggwg-3009483189 Name:nss-test-failure-test-pggwg[0].failData DisplayName:failData Type:Retry TemplateName:get-fail-page TemplateRef:nil TemplateScope:local/ Phase:Running BoundaryID:nss-test-failure-test-pggwg Message: StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned:<nil> Inputs:&Inputs{Parameters:[]Parameter{Parameter{Name:url,Default:nil,Value:*https://google.com/page503,ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},Parameter{Name:httpmethod,Default:nil,Value:*POST,ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},Parameter{Name:success,Default:nil,Value:*{{inputs.parameters.success}},ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},},Artifacts:[]Artifact{},} Outputs:nil Children:[nss-test-failure-test-pggwg-3433505300 nss-test-failure-test-pggwg-2829363921] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-3433505300:{ID:nss-test-failure-test-pggwg-3433505300 Name:nss-test-failure-test-pggwg[0].failData(0) DisplayName:failData(0) Type:HTTP TemplateName:get-fail-page TemplateRef:nil TemplateScope:local/ Phase:Failed BoundaryID:nss-test-failure-test-pggwg Message:received non-2xx response code: 404 StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:2023-09-26 19:52:14 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned:<nil> Inputs:&Inputs{Parameters:[]Parameter{Parameter{Name:url,Default:nil,Value:*https://google.com/page503,ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},Parameter{Name:httpmethod,Default:nil,Value:*POST,ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},Parameter{Name:success,Default:nil,Value:*{{inputs.parameters.success}},ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},},Artifacts:[]Artifact{},} Outputs:&Outputs{Parameters:[]Parameter{},Artifacts:[]Artifact{},Result:*<!DOCTYPE html>\n<html lang=en>\n <meta charset=utf-8>\n <meta name=viewport content=\"initial-scale=1, minimum-scale=1, width=device-width\">\n <title>Error 404 (Not Found)!!1</title>\n <style>\n *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\n </style>\n <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n <p><b>404.</b> <ins>That’s an error.</ins>\n <p>The requested URL <code>/page503</code> was not found on this server. <ins>That’s all we know.</ins>\n,ExitCode:nil,} Children:[] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-3792583316:{ID:nss-test-failure-test-pggwg-3792583316 Namenss-test-failure-test-pggwg.onExit DisplayName:nss-test-failure-test-pggwg.onExit Type:Steps TemplateName:exit-handler TemplateRef:nil TemplateScope:local/ Phase:Running BoundaryID: Message: StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:2/2 ResourcesDuration:7s*(100Mi memory),7s*(1 cpu) PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[nss-test-failure-test-pggwg-3094081150] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil}]" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.025Z" level=info msg="node nss-test-failure-test-pggwg-1321674353 finished: 2023-09-26 19:52:25.025811902 +0000 UTC" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.025Z" level=info msg="step group nss-test-failure-test-pggwg-1321674353 was unsuccessful: step group deemed errored due to child nss-test-failure-test-pggwg[0].failData error: no Node found by the name of ; wf.Status.Nodes=map[nss-test-failure-test-pggwg:{ID:nss-test-failure-test-pggwg Name:nss-test-failure-test-pggwg DisplayName:nss-test-failure-test-pggwg Type:Steps TemplateName:start-test-fail TemplateRef:nil TemplateScope:local/ Phase:Running BoundaryID: Message: StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[nss-test-failure-test-pggwg-1321674353] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-1321674353:{ID:nss-test-failure-test-pggwg-1321674353 Name:nss-test-failure-test-pggwg[0] DisplayName:[0] Type:StepGroup TemplateName: TemplateRef:nil TemplateScope:local/ Phase:Running BoundaryID:nss-test-failure-test-pggwg Message: StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[nss-test-failure-test-pggwg-3009483189] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-2829363921:{ID: Name: DisplayName: Type: TemplateName: TemplateRef:nil TemplateScope: Phase:Failed BoundaryID: Message:context canceled StartedAt:0001-01-01 00:00:00 +0000 UTC FinishedAt:2023-09-26 19:52:14 +0000 UTC EstimatedDuration:0 Progress: ResourcesDuration: PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-2963437778:{ID: Name: DisplayName: Type: TemplateName: TemplateRef:nil TemplateScope: Phase:Failed BoundaryID: Message:context canceled StartedAt:0001-01-01 00:00:00 +0000 UTC FinishedAt:2023-09-26 19:52:14 +0000 UTC EstimatedDuration:0 Progress: ResourcesDuration: PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-3009483189:{ID:nss-test-failure-test-pggwg-3009483189 Name:nss-test-failure-test-pggwg[0].failData DisplayName:failData Type:Retry TemplateName:get-fail-page TemplateRef:nil TemplateScope:local/ Phase:Running BoundaryID:nss-test-failure-test-pggwg Message: StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned:<nil> Inputs:&Inputs{Parameters:[]Parameter{Parameter{Name:url,Default:nil,Value:*https://google.com/page503,ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},Parameter{Name:httpmethod,Default:nil,Value:*POST,ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},Parameter{Name:success,Default:nil,Value:*{{inputs.parameters.success}},ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},},Artifacts:[]Artifact{},} Outputs:nil Children:[nss-test-failure-test-pggwg-3433505300 nss-test-failure-test-pggwg-2829363921] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-3433505300:{ID:nss-test-failure-test-pggwg-3433505300 Name:nss-test-failure-test-pggwg[0].failData(0) DisplayName:failData(0) Type:HTTP TemplateName:get-fail-page TemplateRef:nil TemplateScope:local/ Phase:Failed BoundaryID:nss-test-failure-test-pggwg Message:received non-2xx response code: 404 StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:2023-09-26 19:52:14 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned:<nil> Inputs:&Inputs{Parameters:[]Parameter{Parameter{Name:url,Default:nil,Value:*https://google.com/page503,ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},Parameter{Name:httpmethod,Default:nil,Value:*POST,ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},Parameter{Name:success,Default:nil,Value:*{{inputs.parameters.success}},ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},},Artifacts:[]Artifact{},} Outputs:&Outputs{Parameters:[]Parameter{},Artifacts:[]Artifact{},Result:*<!DOCTYPE html>\n<html lang=en>\n <meta charset=utf-8>\n <meta name=viewport content=\"initial-scale=1, minimum-scale=1, width=device-width\">\n <title>Error 404 (Not Found)!!1</title>\n <style>\n *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\n </style>\n <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n <p><b>404.</b> <ins>That’s an error.</ins>\n <p>The requested URL <code>/page503</code> was not found on this server. <ins>That’s all we know.</ins>\n,ExitCode:nil,} Children:[] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-3792583316:{ID:nss-test-failure-test-pggwg-3792583316 Name:nss-test-failure-test-pggwg.onExit DisplayName:nss-test-failure-test-pggwg.onExit Type:Steps TemplateName:exit-handler TemplateRef:nil TemplateScope:local/ Phase:Running BoundaryID: Message: StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:2/2 ResourcesDuration:7s*(100Mi memory),7s*(1 cpu) PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[nss-test-failure-test-pggwg-3094081150] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil}]" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.025Z" level=info msg="Outbound nodes of nss-test-failure-test-pggwg-3009483189 is [nss-test-failure-test-pggwg-2829363921]" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.025Z" level=info msg="Outbound nodes of nss-test-failure-test-pggwg is [nss-test-failure-test-pggwg-2829363921]" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.026Z" level=error msg="node is already fulfilled" fromPhase=Error namespace=new-store-setup nodeName=nss-test-failure-test-pggwg toPhase=Failed workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.026Z" level=info msg="node nss-test-failure-test-pggwg phase Error -> Failed" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.026Z" level=info msg="node nss-test-failure-test-pggwg message: step group deemed errored due to child nss-test-failure-test-pggwg[0].failData error: no Node found by the name of ; wf.Status.Nodes=map[nss-test-failure-test-pggwg:{ID:nss-test-failure-test-pggwg Name:nss-test-failure-test-pggwg DisplayName:nss-test-failure-test-pggwg Type:Steps TemplateName:start-test-fail TemplateRef:nil TemplateScope:local/ Phase:Running BoundaryID: Message: StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[nss-test-failure-test-pggwg-1321674353] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-1321674353:{ID:nss-test-failure-test-pggwg-1321674353 Name:nss-test-failure-test-pggwg[0] DisplayName:[0] Type:StepGroup TemplateName: TemplateRef:nil TemplateScope:local/ Phase:Running BoundaryID:nss-test-failure-test-pggwg Message: StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[nss-test-failure-test-pggwg-3009483189] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-2829363921:{ID: Name: DisplayName: Type: TemplateName: TemplateRef:nil TemplateScope: Phase:Failed BoundaryID: Message:context canceled StartedAt:0001-01-01 00:00:00 +0000 UTC FinishedAt:2023-09-26 19:52:14 +0000 UTC EstimatedDuration:0 Progress: ResourcesDuration: PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-2963437778:{ID: Name: DisplayName: Type: TemplateName: TemplateRef:nil TemplateScope: Phase:Failed BoundaryID: Message:context canceled StartedAt:0001-01-01 00:00:00 +0000 UTC FinishedAt:2023-09-26 19:52:14 +0000 UTC EstimatedDuration:0 Progress: ResourcesDuration: PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-3009483189:{ID:nss-test-failure-test-pggwg-3009483189 Name:nss-test-failure-test-pggwg[0].failData DisplayName:failData Type:Retry TemplateName:get-fail-page TemplateRef:nil TemplateScope:local/ Phase:Running BoundaryID:nss-test-failure-test-pggwg Message: StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned:<nil> Inputs:&Inputs{Parameters:[]Parameter{Parameter{Name:url,Default:nil,Value:*https://google.com/page503,ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},Parameter{Name:httpmethod,Default:nil,Value:*POST,ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},Parameter{Name:success,Default:nil,Value:*{{inputs.parameters.success}},ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},},Artifacts:[]Artifact{},} Outputs:nil Children:[nss-test-failure-test-pggwg-3433505300 nss-test-failure-test-pggwg-2829363921] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-3433505300:{ID:nss-test-failure-test-pggwg-3433505300 Name:nss-test-failure-test-pggwg[0].failData(0) DisplayName:failData(0) Type:HTTP TemplateName:get-fail-page TemplateRef:nil TemplateScope:local/ Phase:Failed BoundaryID:nss-test-failure-test-pggwg Message:received non-2xx response code: 404 StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:2023-09-26 19:52:14 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned:<nil> Inputs:&Inputs{Parameters:[]Parameter{Parameter{Name:url,Default:nil,Value:*https://google.com/page503,ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},Parameter{Name:httpmethod,Default:nil,Value:*POST,ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},Parameter{Name:success,Default:nil,Value:*{{inputs.parameters.success}},ValueFrom:nil,GlobalName:,Enum:[],Description:nil,},},Artifacts:[]Artifact{},} Outputs:&Outputs{Parameters:[]Parameter{},Artifacts:[]Artifact{},Result:*<!DOCTYPE html>\n<html lang=en>\n <meta charset=utf-8>\n <meta name=viewport content=\"initial-scale=1, minimum-scale=1, width=device-width\">\n <title>Error 404 (Not Found)!!1</title>\n <style>\n *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:7 screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\n </style>\n <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n <p><b>404.</b> <ins>That’s an error.</ins>\n <p>The requested URL <code>/page503</code> was not found on this server. <ins>That’s all we know.</ins>\n,ExitCode:nil,} Children:[] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} nss-test-failure-test-pggwg-3792583316:{ID:nss-test-failure-test-pggwg-3792583316 Name:nss-test-failure-test-pggwg.onExit DisplayName:nss-test-failure-test-pggwg.onExit Type:Steps TemplateName:exit-handler TemplateRef:nil TemplateScope:local/ Phase:Running BoundaryID: Message: StartedAt:2023-09-26 19:52:14 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:2/2 ResourcesDuration:7s*(100Mi memory),7s*(1 cpu) PodIP: Daemoned:<nil> Inputs:nil Outputs:nil Children:[nss-test-failure-test-pggwg-3094081150] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil}]" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.026Z" level=info msg="node nss-test-failure-test-pggwg finished: 2023-09-26 19:52:25.026166907 +0000 UTC" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.026Z" level=info msg="Checking daemoned children of nss-test-failure-test-pggwg" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.026Z" level=info msg="TaskSet Reconciliation" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.026Z" level=info msg=reconcileAgentPod namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.026Z" level=info msg="Running OnExit handler: exit-handler" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.026Z" level=info msg="StepGroup node nss-test-failure-test-pggwg-3094081150 initialized Running" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.027Z" level=info msg="Pod node nss-test-failure-test-pggwg-881966844 initialized Pending" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.031Z" level=info msg="cleaning up pod" action=deletePod key=new-store-setup/nss-test-failure-test-pggwg-1340600742-agent/deletePod
time="2023-09-26T19:52:25.065Z" level=info msg="Created pod: nss-test-failure-test-pggwg.onExit[0].notify-slack (nss-test-failure-test-pggwg-notify-slack-881966844)" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.065Z" level=info msg="Pod node nss-test-failure-test-pggwg-2150953069 initialized Pending" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.110Z" level=info msg="Created pod: nss-test-failure-test-pggwg.onExit[0].send-alert (nss-test-failure-test-pggwg-send-alert-2150953069)" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.110Z" level=info msg="Workflow step group node nss-test-failure-test-pggwg-3094081150 not yet completed" namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.110Z" level=info msg="Checking daemoned children of " namespace=new-store-setup workflow=nss-test-failure-test-pggwg
time="2023-09-26T19:52:25.150Z" level=info msg="Workflow update successful" namespace=new-store-setup phase=Error resourceVersion=385867632 workflow=nss-test-failure-test-pggwg
Logs from in your workflow's wait container
kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded
No resources found in argo namespace.
Tested with multiple scenarios including:
OnExit - workflow level Hook - workflow level Hook - step level
All produce the same result.
Issue linked to this PR - https://github.com/argoproj/argo-workflows/pull/11839
I think the root cause is the TaskSets which was previously saved.
If there are some outputs created by the previous execution, reconciliation treat that as error and stop processing.
- https://github.com/argoproj/argo-workflows/blob/release-3.4.11/workflow/controller/taskset.go#L138-L143
We can see following logs when retrying:
time="2023-10-01T16:09:59.384Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=test-failure-2-l5f9g
time="2023-10-01T16:09:59.385Z" level=warning msg="[SPECIAL][DEBUG] returning but assumed validity before" namespace=argo workflow=test-failure-2-l5f9g
time="2023-10-01T16:09:59.385Z" level=error msg="[DEBUG] Was unable to obtain node for test-failure-2-l5f9g-2934565976" namespace=argo workflow=test-failure-2-l5f9g
If we delete TaskSets before retrying, the workflow retries successfully.
@wesleyscholl
Are you interested in submitting a PR?
I think this issue is solved by not return here, but simply continue.
( I'm asking you because you've checked 3rd box :) )
I'm unable to build - https://github.com/argoproj/argo-workflows/discussions/11936
I attempted from dev conatiners and make start, mac and windows.
Updated the above issue with errors.
I'll have to try building on my computer at home.
Is it possible to access the UI in GitHub Codespaces? This is what I see after running make start UI=true.
■ port-forwa running [9000] Handling connection for 9000
■ controller running [9090] time="2023-10-30T15:30:47.272Z" level=debug msg="Update leases 200"
■ server running [2746] time="2023-10-30T15:27:11.585Z" level=info msg="Alloc=11589 TotalAlloc=19750 Sys=23141 NumGC=6 Goroutines=105"
■ ui running [8080] webpack 5.89.0 compiled with 42 warnings in 24484 ms
v0.1.14 8m44s logs in logs [1..4+Enter] enable logging at ERROR..DEBUG [0+Enter] disable logging
I think the root cause is the
TaskSetswhich was previously saved. If there are some outputs created by the previous execution,reconciliationtreat that as error and stop processing.
- https://github.com/argoproj/argo-workflows/blob/release-3.4.11/workflow/controller/taskset.go#L138-L143
We can see following logs when retrying:
time="2023-10-01T16:09:59.384Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=test-failure-2-l5f9g time="2023-10-01T16:09:59.385Z" level=warning msg="[SPECIAL][DEBUG] returning but assumed validity before" namespace=argo workflow=test-failure-2-l5f9g time="2023-10-01T16:09:59.385Z" level=error msg="[DEBUG] Was unable to obtain node for test-failure-2-l5f9g-2934565976" namespace=argo workflow=test-failure-2-l5f9gIf we delete
TaskSetsbefore retrying, the workflow retries successfully.
@toyamagu-2021 I'm finally able to build using GitHub Codespaces.
When deleting TaskSets with:
woc.deleteTaskSet(ctx)
I get the following error:
time="2023-10-30T18:05:40.035Z" level=debug msg="ignore signal child exited" argo=true
time="2023-10-30T18:05:41.008Z" level=info msg="sub-process exited" argo=true error="<nil>"
I've also tried replacing return err with continue and it produces the same error.
More detailed logs from controller:
time="2023-10-30T18:49:10.376Z" level=info msg="Processing workflow" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:10.376Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=0 workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:10.376Z" level=debug msg="Evaluating node nss-test-failure-test-sgpnj: template: *v1alpha1.WorkflowStep (start-test-fail), boundaryID: " namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:10.376Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (start-test-fail)"
time="2023-10-30T18:49:10.376Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (start-test-fail)"
time="2023-10-30T18:49:10.376Z" level=debug msg="Getting the template by name: start-test-fail" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (start-test-fail)"
time="2023-10-30T18:49:10.376Z" level=debug msg="Executing node nss-test-failure-test-sgpnj of Steps is Running" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:10.376Z" level=debug msg="Step group node &NodeStatus{ID:nss-test-failure-test-sgpnj-3770584575,Name:nss-test-failure-test-sgpnj[0],DisplayName:[0],Type:StepGroup,TemplateName:,TemplateRef:nil,Phase:Succeeded,BoundaryID:nss-test-failure-test-sgpnj,Message:,StartedAt:2023-10-30 18:48:18 +0000 UTC,FinishedAt:2023-10-30 18:48:26 +0000 UTC,PodIP:,Daemoned:nil,Inputs:nil,Outputs:nil,Children:[nss-test-failure-test-sgpnj-311221617],OutboundNodes:[],TemplateScope:local/,ResourcesDuration:ResourcesDuration{cpu: 5s,memory: 3s,},HostNodeName:,MemoizationStatus:nil,EstimatedDuration:7,SynchronizationStatus:nil,Progress:1/4,NodeFlag:&NodeFlag{Hooked:false,Retried:false,},} already marked completed" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:10.376Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.NodeStatus (node)"
time="2023-10-30T18:49:10.376Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.NodeStatus (node)"
time="2023-10-30T18:49:10.376Z" level=debug msg="Getting the template by name: node" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.NodeStatus (node)"
time="2023-10-30T18:49:10.376Z" level=info msg="SG Outbound nodes of nss-test-failure-test-sgpnj-311221617 are [nss-test-failure-test-sgpnj-311221617]" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:10.376Z" level=debug msg="Evaluating node nss-test-failure-test-sgpnj[1].get-stores-status: template: *v1alpha1.WorkflowStep (http-retry), boundaryID: nss-test-failure-test-sgpnj" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:10.376Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:10.376Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (http-retry)"
time="2023-10-30T18:49:10.376Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (http-retry)"
time="2023-10-30T18:49:10.377Z" level=debug msg="Getting the template by name: http-retry" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (http-retry)"
time="2023-10-30T18:49:10.377Z" level=debug msg="unresolved is allowed " error=unresolved
time="2023-10-30T18:49:10.377Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.NodeStatus (start-test-fail)"
time="2023-10-30T18:49:10.377Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.NodeStatus (start-test-fail)"
time="2023-10-30T18:49:10.377Z" level=debug msg="Getting the template by name: start-test-fail" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.NodeStatus (start-test-fail)"
time="2023-10-30T18:49:10.377Z" level=debug msg="Inject a retry node for node nss-test-failure-test-sgpnj[1].get-stores-status" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:10.377Z" level=debug msg="Initializing node nss-test-failure-test-sgpnj[1].get-stores-status: template: *v1alpha1.WorkflowStep (http-retry), boundaryID: nss-test-failure-test-sgpnj" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:10.377Z" level=info msg="Retry node nss-test-failure-test-sgpnj-1314530950 initialized Running" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:10.377Z" level=debug msg="Initializing node nss-test-failure-test-sgpnj[1].get-stores-status(0): template: *v1alpha1.WorkflowStep (http-retry), boundaryID: nss-test-failure-test-sgpnj" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:10.377Z" level=info msg="HTTP node nss-test-failure-test-sgpnj-1520507653 initialized Pending" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:10.377Z" level=info msg="Workflow step group node nss-test-failure-test-sgpnj-2764074530 not yet completed" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:10.377Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:10.377Z" level=warning msg="[SPECIAL][DEBUG] returning but assumed validity before" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:10.377Z" level=error msg="[DEBUG] Was unable to obtain node for nss-test-failure-test-sgpnj-1318882035" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:10.377Z" level=error msg="error in workflowtaskset reconciliation" error="key was not found for nss-test-failure-test-sgpnj-1318882035" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:10.378Z" level=debug msg="Log changes patch: {\"status\":{\"nodes\":{\"nss-test-failure-test-sgpnj-1314530950\":{\"boundaryID\":\"nss-test-failure-test-sgpnj\",\"children\":[\"nss-test-failure-test-sgpnj-1520507653\"],\"displayName\":\"get-stores-status\",\"estimatedDuration\":12,\"finishedAt\":null,\"id\":\"nss-test-failure-test-sgpnj-1314530950\",\"inputs\":{\"parameters\":[{\"name\":\"url\",\"value\":\"https://google.com\"}]},\"name\":\"nss-test-failure-test-sgpnj[1].get-stores-status\",\"phase\":\"Running\",\"startedAt\":\"2023-10-30T18:49:10Z\",\"templateName\":\"http-retry\",\"templateScope\":\"local/\",\"type\":\"Retry\"},\"nss-test-failure-test-sgpnj-1520507653\":{\"boundaryID\":\"nss-test-failure-test-sgpnj\",\"displayName\":\"get-stores-status(0)\",\"estimatedDuration\":19,\"finishedAt\":null,\"id\":\"nss-test-failure-test-sgpnj-1520507653\",\"inputs\":{\"parameters\":[{\"name\":\"url\",\"value\":\"https://google.com\"}]},\"name\":\"nss-test-failure-test-sgpnj[1].get-stores-status(0)\",\"nodeFlag\":{\"retried\":true},\"phase\":\"Pending\",\"startedAt\":\"2023-10-30T18:49:10Z\",\"templateName\":\"http-retry\",\"templateScope\":\"local/\",\"type\":\"HTTP\"},\"nss-test-failure-test-sgpnj-2764074530\":{\"children\":[\"nss-test-failure-test-sgpnj-1314530950\"]}}}}"
time="2023-10-30T18:49:10.378Z" level=warning msg="Coudn't obtain child for nss-test-failure-test-sgpnj-2732484172, panicking"
time="2023-10-30T18:49:10.378Z" level=info msg="Workflow to be dehydrated" Workflow Size=4782
time="2023-10-30T18:49:10.384Z" level=debug msg="Update workflows 200"
time="2023-10-30T18:49:10.386Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=13385 workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:10.386Z" level=debug msg="Event(v1.ObjectReference{Kind:\"Workflow\", Namespace:\"argo\", Name:\"nss-test-failure-test-sgpnj\", UID:\"9b8ab2ae-511d-48ef-855e-7ffe7611439c\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"13385\", FieldPath:\"\"}): type: 'Normal' reason: 'WorkflowNodeRunning' Running node nss-test-failure-test-sgpnj[1].get-stores-status"
time="2023-10-30T18:49:10.397Z" level=debug msg="Patch events 200"
time="2023-10-30T18:49:10.400Z" level=debug msg="Patch workflowtasksets 200"
time="2023-10-30T18:49:11.376Z" level=info msg="Processing workflow" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:11.377Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=0 workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:11.377Z" level=debug msg="Evaluating node nss-test-failure-test-sgpnj: template: *v1alpha1.WorkflowStep (start-test-fail), boundaryID: " namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:11.377Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (start-test-fail)"
time="2023-10-30T18:49:11.377Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (start-test-fail)"
time="2023-10-30T18:49:11.377Z" level=debug msg="Getting the template by name: start-test-fail" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (start-test-fail)"
time="2023-10-30T18:49:11.377Z" level=debug msg="Executing node nss-test-failure-test-sgpnj of Steps is Running" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:11.377Z" level=debug msg="Step group node &NodeStatus{ID:nss-test-failure-test-sgpnj-3770584575,Name:nss-test-failure-test-sgpnj[0],DisplayName:[0],Type:StepGroup,TemplateName:,TemplateRef:nil,Phase:Succeeded,BoundaryID:nss-test-failure-test-sgpnj,Message:,StartedAt:2023-10-30 18:48:18 +0000 UTC,FinishedAt:2023-10-30 18:48:26 +0000 UTC,PodIP:,Daemoned:nil,Inputs:nil,Outputs:nil,Children:[nss-test-failure-test-sgpnj-311221617],OutboundNodes:[],TemplateScope:local/,ResourcesDuration:ResourcesDuration{cpu: 5s,memory: 3s,},HostNodeName:,MemoizationStatus:nil,EstimatedDuration:7,SynchronizationStatus:nil,Progress:1/2,NodeFlag:&NodeFlag{Hooked:false,Retried:false,},} already marked completed" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:11.377Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.NodeStatus (node)"
time="2023-10-30T18:49:11.377Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.NodeStatus (node)"
time="2023-10-30T18:49:11.377Z" level=debug msg="Getting the template by name: node" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.NodeStatus (node)"
time="2023-10-30T18:49:11.377Z" level=info msg="SG Outbound nodes of nss-test-failure-test-sgpnj-311221617 are [nss-test-failure-test-sgpnj-311221617]" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:11.377Z" level=debug msg="Evaluating node nss-test-failure-test-sgpnj[1].get-stores-status: template: *v1alpha1.WorkflowStep (http-retry), boundaryID: nss-test-failure-test-sgpnj" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:11.377Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (http-retry)"
time="2023-10-30T18:49:11.377Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (http-retry)"
time="2023-10-30T18:49:11.377Z" level=debug msg="Getting the template by name: http-retry" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (http-retry)"
time="2023-10-30T18:49:11.377Z" level=debug msg="unresolved is allowed " error=unresolved
time="2023-10-30T18:49:11.377Z" level=debug msg="Executing node nss-test-failure-test-sgpnj[1].get-stores-status of Retry is Running" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:11.377Z" level=info msg="Workflow step group node nss-test-failure-test-sgpnj-2764074530 not yet completed" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:11.377Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:11.377Z" level=warning msg="[SPECIAL][DEBUG] returning but assumed validity before" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:11.377Z" level=error msg="[DEBUG] Was unable to obtain node for nss-test-failure-test-sgpnj-1318882035" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:11.377Z" level=error msg="error in workflowtaskset reconciliation" error="key was not found for nss-test-failure-test-sgpnj-1318882035" namespace=argo workflow=nss-test-failure-test-sgpnj
time="2023-10-30T18:49:11.442Z" level=debug msg="Syncing all CronWorkflows"
time="2023-10-30T18:49:12.004Z" level=debug msg="Get leases 200"
time="2023-10-30T18:49:12.007Z" level=debug msg="Update leases 200"
time="2023-10-30T18:49:17.010Z" level=debug msg="Get leases 200"
time="2023-10-30T18:49:17.013Z" level=debug msg="Update leases 200"
time="2023-10-30T18:49:21.443Z" level=debug msg="Syncing all CronWorkflows"
time="2023-10-30T18:49:22.016Z" level=debug msg="Get leases 200"
time="2023-10-30T18:49:22.019Z" level=debug msg="Update leases 200"
time="2023-10-30T18:49:23.244Z" level=info msg="cleaning up pod" action=killContainers key=argo/nss-test-failure-test-sgpnj-node-2560466722/killContainers
time="2023-10-30T18:49:23.249Z" level=info msg="cleaning up pod" action=killContainers key=argo/nss-test-failure-test-sgpnj-node-3806041575/killContainers
I think the root cause is the
TaskSetswhich was previously saved. If there are some outputs created by the previous execution,reconciliationtreat that as error and stop processing.
- https://github.com/argoproj/argo-workflows/blob/release-3.4.11/workflow/controller/taskset.go#L138-L143
We can see following logs when retrying:
time="2023-10-01T16:09:59.384Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=test-failure-2-l5f9g time="2023-10-01T16:09:59.385Z" level=warning msg="[SPECIAL][DEBUG] returning but assumed validity before" namespace=argo workflow=test-failure-2-l5f9g time="2023-10-01T16:09:59.385Z" level=error msg="[DEBUG] Was unable to obtain node for test-failure-2-l5f9g-2934565976" namespace=argo workflow=test-failure-2-l5f9gIf we delete
TaskSetsbefore retrying, the workflow retries successfully.
Yes, the reason is completed node status in TaskSets were not deleted successfully, the same bug with #11489
@wesleyscholl
Are you interested in submitting a PR? I think this issue is solved by not
returnhere, but simplycontinue.( I'm asking you because you've checked 3rd box :) )
This PR #12620 fixes the WorkflowTaskSet patch bug where the node status were not successfully deleted, following the original code logic.
Instead of fixing the patch bug, it would be much simpler to delete the entire WorkflowTaskSet after workflow is completed. How do you think about it?
Still a similar result on http retry. After the retry it hangs between steps.
Before retry:
After retry:
Can you paste a small workflow to reproduce it ? @wesleyscholl
metadata:
name: test-failure-test
generateName: test-failure-test-
namespace: new-setup
spec:
templates:
- name: start-test-fail
inputs: {}
outputs: {}
metadata: {}
steps:
- - name: get-stores-status
template: http-retry
arguments:
parameters:
- name: url
value: http://httpstat.us/Random/400-404,500-504
- name: http-retry
inputs:
parameters:
- name: url
outputs: {}
metadata: {}
http:
method: GET
url: '{{inputs.parameters.url}}'
timeoutSeconds: 20
successCondition: response.statusCode == 200
retryStrategy:
limit: 3
retryPolicy: Always
backoff:
duration: 10s
factor: 1
maxDuration: 10m
entrypoint: start-test-fail
arguments: {}
@wesleyscholl Sorry, I can't reproduce it with the workflow you provided. Which version you are currently using? Have you tested it on version v3.5.5 or above?
@jswxstw @shuangkun @toyamagu-2021 Apologies, I forgot to include the exit handler and we are using v3.5.4.
Reproducible (Run the work flow then retry the http template, then it hangs):
metadata:
name: test-failure-test
spec:
templates:
- name: start-test-fail
steps:
- - name: get-status
template: http-retry
arguments:
parameters:
- name: url
value: http://httpstat.us/Random/400-404,500-504
- name: http-retry
inputs:
parameters:
- name: url
http:
method: GET
url: '{{inputs.parameters.url}}'
timeoutSeconds: 20
successCondition: response.statusCode == 200
retryStrategy:
limit: 3
retryPolicy: Always
backoff:
duration: 10s
factor: 1
maxDuration: 10m
- name: exit-handler
steps:
- - name: log-workflow-status
template: log-message
- - name: conditional-alert
template: log-message
when: '{{workflow.status}} != "Succeeded"'
- name: log-message
container:
name: ''
image: alpine:latest
command:
- echo
entrypoint: start-test-fail
arguments: {}
onExit: exit-handler
@jswxstw @shuangkun @toyamagu-2021 Apologies, I forgot to include the exit handler and we are using v3.5.4.
Please try v3.5.5 or above, this issue has been fixed.
Confirmed this has been fixed in v3.5.7, I am able to retry the workflow task sets (http retry). We will upgrade our argo workflows version. Thanks!
Upgraded to v3.5.5 and tested. Confirmed that the manual retry works for workflow task sets. 👍🏻
https://github.com/argoproj/argo-workflows/assets/128409641/916667d8-062b-43a4-9b2d-6bd7fa039a14
Fixed by #12620 based on the above comment. Thanks all!