argo-workflows
argo-workflows copied to clipboard
Argo workflows - controller crashes if bad workflow is run
Checklist
- [x] Double-checked my configuration.
- [x] Tested using the latest version.
- [ ] Used the Emissary executor.
Summary
What happened/what you expected to happen?
expectation: Argo Workflows (from user PoV) should report bad config.
what happened: workflow controller crashes with OOMKilled
.
What version are you running?
3.3.8
Diagnostics
Paste the smallest workflow that reproduces the bug. We must be able to run the workflow.
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: this-crashes-argo-workflows-controller
spec:
entrypoint: nested-workflow-example
templates:
- name: nested-workflow-example
steps:
- - name: generate
template: nested-workflow-example
# Logs from the workflow controller:
time="2022-07-23T18:30:41.730Z" level=info msg="Updated phase -> Running" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.730Z" level=debug msg="Evaluating node this-crashes-argo-workflows-sc575: template: *v1alpha1.WorkflowStep (nested-workflow-example), boundaryID: " namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.730Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.730Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.730Z" level=debug msg="Event(v1.ObjectReference{Kind:\"Workflow\", Namespace:\"argo\", Name:\"this-crashes-argo-workflows-sc575\", UID:\"c9627b67-7b19-4193-99a6-e97c9150f68d\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"1226203089\", FieldPath:\"\"}): type: 'Normal' reason: 'WorkflowRunning' Workflow Running"
time="2022-07-23T18:30:41.730Z" level=debug msg="Getting the template by name" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.731Z" level=debug msg="Initializing node this-crashes-argo-workflows-sc575: template: *v1alpha1.WorkflowStep (nested-workflow-example), boundaryID: " namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.731Z" level=info msg="Steps node this-crashes-argo-workflows-sc575 initialized Running" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.731Z" level=debug msg="Initializing node this-crashes-argo-workflows-sc575[0]: template: *v1alpha1.WorkflowStep invalid (https://argoproj.github.io/argo-workflows/templates/), boundaryID: this-crashes-argo-workflows-sc575" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.731Z" level=info msg="StepGroup node this-crashes-argo-workflows-sc575-720580794 initialized Running" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.731Z" level=debug msg="Evaluating node this-crashes-argo-workflows-sc575[0].generate: template: *v1alpha1.WorkflowStep (nested-workflow-example), boundaryID: this-crashes-argo-workflows-sc575" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.731Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.731Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.731Z" level=debug msg="Getting the template by name" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.731Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.NodeStatus (nested-workflow-example)"
time="2022-07-23T18:30:41.731Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.NodeStatus (nested-workflow-example)"
time="2022-07-23T18:30:41.731Z" level=debug msg="Getting the template by name" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.NodeStatus (nested-workflow-example)"
time="2022-07-23T18:30:41.731Z" level=debug msg="Initializing node this-crashes-argo-workflows-sc575[0].generate: template: *v1alpha1.WorkflowStep (nested-workflow-example), boundaryID: this-crashes-argo-workflows-sc575" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.731Z" level=info msg="Steps node this-crashes-argo-workflows-sc575-3463303851 initialized Running" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.731Z" level=debug msg="Initializing node this-crashes-argo-workflows-sc575[0].generate[0]: template: *v1alpha1.WorkflowStep invalid (https://argoproj.github.io/argo-workflows/templates/), boundaryID: this-crashes-argo-workflows-sc575-3463303851" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.731Z" level=info msg="StepGroup node this-crashes-argo-workflows-sc575-2775766343 initialized Running" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=debug msg="Evaluating node this-crashes-argo-workflows-sc575[0].generate[0].generate: template: *v1alpha1.WorkflowStep (nested-workflow-example), boundaryID: this-crashes-argo-workflows-sc575-3463303851" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Getting the template by name" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.NodeStatus (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.NodeStatus (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Getting the template by name" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.NodeStatus (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Initializing node this-crashes-argo-workflows-sc575[0].generate[0].generate: template: *v1alpha1.WorkflowStep (nested-workflow-example), boundaryID: this-crashes-argo-workflows-sc575-3463303851" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=info msg="Steps node this-crashes-argo-workflows-sc575-2559107108 initialized Running" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=debug msg="Initializing node this-crashes-argo-workflows-sc575[0].generate[0].generate[0]: template: *v1alpha1.WorkflowStep invalid (https://argoproj.github.io/argo-workflows/templates/), boundaryID: this-crashes-argo-workflows-sc575-2559107108" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=info msg="StepGroup node this-crashes-argo-workflows-sc575-1275372750 initialized Running" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=debug msg="Evaluating node this-crashes-argo-workflows-sc575[0].generate[0].generate[0].generate: template: *v1alpha1.WorkflowStep (nested-workflow-example), boundaryID: this-crashes-argo-workflows-sc575-2559107108" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Getting the template by name" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.NodeStatus (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.NodeStatus (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Getting the template by name" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.NodeStatus (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Initializing node this-crashes-argo-workflows-sc575[0].generate[0].generate[0].generate: template: *v1alpha1.WorkflowStep (nested-workflow-example), boundaryID: this-crashes-argo-workflows-sc575-2559107108" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=info msg="Steps node this-crashes-argo-workflows-sc575-1279650431 initialized Running" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=debug msg="Initializing node this-crashes-argo-workflows-sc575[0].generate[0].generate[0].generate[0]: template: *v1alpha1.WorkflowStep invalid (https://argoproj.github.io/argo-workflows/templates/), boundaryID: this-crashes-argo-workflows-sc575-1279650431" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=info msg="StepGroup node this-crashes-argo-workflows-sc575-610496875 initialized Running" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=debug msg="Evaluating node this-crashes-argo-workflows-sc575[0].generate[0].generate[0].generate[0].generate: template: *v1alpha1.WorkflowStep (nested-workflow-example), boundaryID: this-crashes-argo-workflows-sc575-1279650431" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Getting the template by name" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.NodeStatus (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.NodeStatus (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Getting the template by name" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.NodeStatus (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Initializing node this-crashes-argo-workflows-sc575[0].generate[0].generate[0].generate[0].generate: template: *v1alpha1.WorkflowStep (nested-workflow-example), boundaryID: this-crashes-argo-workflows-sc575-1279650431" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=info msg="Steps node this-crashes-argo-workflows-sc575-818332680 initialized Running" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=debug msg="Initializing node this-crashes-argo-workflows-sc575[0].generate[0].generate[0].generate[0].generate[0]: template: *v1alpha1.WorkflowStep invalid (https://argoproj.github.io/argo-workflows/templates/), boundaryID: this-crashes-argo-workflows-sc575-818332680" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=info msg="StepGroup node this-crashes-argo-workflows-sc575-216752370 initialized Running" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=debug msg="Evaluating node this-crashes-argo-workflows-sc575[0].generate[0].generate[0].generate[0].generate[0].generate: template: *v1alpha1.WorkflowStep (nested-workflow-example), boundaryID: this-crashes-argo-workflows-sc575-818332680" namespace=argo workflow=this-crashes-argo-workflows-sc575
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
@corneredrat we can add validation for cyclic reference
@corneredrat we can add validation for cyclic reference
@sarabala1979 will try, thanks..
@corneredrat Do you have any ETA to work on this? If you need any help let us know.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.
we also need solution for this issue
@bengoldenberg Can you provide your usecase? The above use case considers a user error. @corneredrat are you looking on this issue?
Hi @sarabala1979 I have started looking into the issue. Will share a PR in a week or so
@bengoldenberg Can you provide your usecase? The above use case considers a user error. @corneredrat are you looking on this issue?
when someone create incorrect workflow template structure, the argo ui doesnt show any option to select, and throw a http request error.
Hi @sarabala1979 , in controller - steps.go link will adding following logic suffice to stop infinite loop?
if stepsCtx.scope.tmpl.Name == step.Template {
// set node status as failed / invalid
}
This is what I get from first go-through of code; Please suggest a better alternative that would cover all usecases that may endup putting the workflow in infinite loop
Work-around: delete bad workflow.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.
This issue has been closed due to inactivity. Feel free to re-open if you still encounter this issue.
The example Workflow causes infinite recursion. Duplicate of #4180, #11499, #11497.
There is a recursion depth limit now as of #11646, which effectively fixes this. There may be better detection options for direct recursion (as in the example Workflow here), can follow #11497 for that specific variation.