argo-workflows icon indicating copy to clipboard operation
argo-workflows copied to clipboard

Argo workflows - controller crashes if bad workflow is run

Open corneredrat opened this issue 1 year ago • 10 comments

Checklist

  • [x] Double-checked my configuration.
  • [x] Tested using the latest version.
  • [ ] Used the Emissary executor.

Summary

What happened/what you expected to happen? expectation: Argo Workflows (from user PoV) should report bad config. what happened: workflow controller crashes with OOMKilled.

What version are you running? 3.3.8

Diagnostics

Paste the smallest workflow that reproduces the bug. We must be able to run the workflow.

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
 name: this-crashes-argo-workflows-controller
spec:
 entrypoint: nested-workflow-example
 templates:
 - name: nested-workflow-example
   steps:
   - - name: generate
       template: nested-workflow-example
# Logs from the workflow controller:

time="2022-07-23T18:30:41.730Z" level=info msg="Updated phase  -> Running" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.730Z" level=debug msg="Evaluating node this-crashes-argo-workflows-sc575: template: *v1alpha1.WorkflowStep (nested-workflow-example), boundaryID: " namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.730Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.730Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.730Z" level=debug msg="Event(v1.ObjectReference{Kind:\"Workflow\", Namespace:\"argo\", Name:\"this-crashes-argo-workflows-sc575\", UID:\"c9627b67-7b19-4193-99a6-e97c9150f68d\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"1226203089\", FieldPath:\"\"}): type: 'Normal' reason: 'WorkflowRunning' Workflow Running"
time="2022-07-23T18:30:41.730Z" level=debug msg="Getting the template by name" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.731Z" level=debug msg="Initializing node this-crashes-argo-workflows-sc575: template: *v1alpha1.WorkflowStep (nested-workflow-example), boundaryID: " namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.731Z" level=info msg="Steps node this-crashes-argo-workflows-sc575 initialized Running" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.731Z" level=debug msg="Initializing node this-crashes-argo-workflows-sc575[0]: template: *v1alpha1.WorkflowStep invalid (https://argoproj.github.io/argo-workflows/templates/), boundaryID: this-crashes-argo-workflows-sc575" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.731Z" level=info msg="StepGroup node this-crashes-argo-workflows-sc575-720580794 initialized Running" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.731Z" level=debug msg="Evaluating node this-crashes-argo-workflows-sc575[0].generate: template: *v1alpha1.WorkflowStep (nested-workflow-example), boundaryID: this-crashes-argo-workflows-sc575" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.731Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.731Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.731Z" level=debug msg="Getting the template by name" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.731Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.NodeStatus (nested-workflow-example)"
time="2022-07-23T18:30:41.731Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.NodeStatus (nested-workflow-example)"
time="2022-07-23T18:30:41.731Z" level=debug msg="Getting the template by name" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.NodeStatus (nested-workflow-example)"
time="2022-07-23T18:30:41.731Z" level=debug msg="Initializing node this-crashes-argo-workflows-sc575[0].generate: template: *v1alpha1.WorkflowStep (nested-workflow-example), boundaryID: this-crashes-argo-workflows-sc575" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.731Z" level=info msg="Steps node this-crashes-argo-workflows-sc575-3463303851 initialized Running" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.731Z" level=debug msg="Initializing node this-crashes-argo-workflows-sc575[0].generate[0]: template: *v1alpha1.WorkflowStep invalid (https://argoproj.github.io/argo-workflows/templates/), boundaryID: this-crashes-argo-workflows-sc575-3463303851" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.731Z" level=info msg="StepGroup node this-crashes-argo-workflows-sc575-2775766343 initialized Running" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=debug msg="Evaluating node this-crashes-argo-workflows-sc575[0].generate[0].generate: template: *v1alpha1.WorkflowStep (nested-workflow-example), boundaryID: this-crashes-argo-workflows-sc575-3463303851" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Getting the template by name" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.NodeStatus (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.NodeStatus (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Getting the template by name" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.NodeStatus (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Initializing node this-crashes-argo-workflows-sc575[0].generate[0].generate: template: *v1alpha1.WorkflowStep (nested-workflow-example), boundaryID: this-crashes-argo-workflows-sc575-3463303851" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=info msg="Steps node this-crashes-argo-workflows-sc575-2559107108 initialized Running" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=debug msg="Initializing node this-crashes-argo-workflows-sc575[0].generate[0].generate[0]: template: *v1alpha1.WorkflowStep invalid (https://argoproj.github.io/argo-workflows/templates/), boundaryID: this-crashes-argo-workflows-sc575-2559107108" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=info msg="StepGroup node this-crashes-argo-workflows-sc575-1275372750 initialized Running" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=debug msg="Evaluating node this-crashes-argo-workflows-sc575[0].generate[0].generate[0].generate: template: *v1alpha1.WorkflowStep (nested-workflow-example), boundaryID: this-crashes-argo-workflows-sc575-2559107108" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Getting the template by name" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.NodeStatus (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.NodeStatus (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Getting the template by name" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.NodeStatus (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Initializing node this-crashes-argo-workflows-sc575[0].generate[0].generate[0].generate: template: *v1alpha1.WorkflowStep (nested-workflow-example), boundaryID: this-crashes-argo-workflows-sc575-2559107108" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=info msg="Steps node this-crashes-argo-workflows-sc575-1279650431 initialized Running" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=debug msg="Initializing node this-crashes-argo-workflows-sc575[0].generate[0].generate[0].generate[0]: template: *v1alpha1.WorkflowStep invalid (https://argoproj.github.io/argo-workflows/templates/), boundaryID: this-crashes-argo-workflows-sc575-1279650431" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=info msg="StepGroup node this-crashes-argo-workflows-sc575-610496875 initialized Running" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=debug msg="Evaluating node this-crashes-argo-workflows-sc575[0].generate[0].generate[0].generate[0].generate: template: *v1alpha1.WorkflowStep (nested-workflow-example), boundaryID: this-crashes-argo-workflows-sc575-1279650431" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Getting the template by name" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.NodeStatus (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.NodeStatus (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Getting the template by name" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.NodeStatus (nested-workflow-example)"
time="2022-07-23T18:30:41.732Z" level=debug msg="Initializing node this-crashes-argo-workflows-sc575[0].generate[0].generate[0].generate[0].generate: template: *v1alpha1.WorkflowStep (nested-workflow-example), boundaryID: this-crashes-argo-workflows-sc575-1279650431" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=info msg="Steps node this-crashes-argo-workflows-sc575-818332680 initialized Running" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=debug msg="Initializing node this-crashes-argo-workflows-sc575[0].generate[0].generate[0].generate[0].generate[0]: template: *v1alpha1.WorkflowStep invalid (https://argoproj.github.io/argo-workflows/templates/), boundaryID: this-crashes-argo-workflows-sc575-818332680" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=info msg="StepGroup node this-crashes-argo-workflows-sc575-216752370 initialized Running" namespace=argo workflow=this-crashes-argo-workflows-sc575
time="2022-07-23T18:30:41.732Z" level=debug msg="Evaluating node this-crashes-argo-workflows-sc575[0].generate[0].generate[0].generate[0].generate[0].generate: template: *v1alpha1.WorkflowStep (nested-workflow-example), boundaryID: this-crashes-argo-workflows-sc575-818332680" namespace=argo workflow=this-crashes-argo-workflows-sc575

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

corneredrat avatar Jul 24 '22 17:07 corneredrat

@corneredrat we can add validation for cyclic reference

sarabala1979 avatar Jul 25 '22 14:07 sarabala1979

@corneredrat we can add validation for cyclic reference

@sarabala1979 will try, thanks..

corneredrat avatar Jul 25 '22 16:07 corneredrat

@corneredrat Do you have any ETA to work on this? If you need any help let us know.

sarabala1979 avatar Jul 26 '22 15:07 sarabala1979

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

stale[bot] avatar Aug 12 '22 04:08 stale[bot]

we also need solution for this issue

bengoldenberg avatar Aug 14 '22 22:08 bengoldenberg

@bengoldenberg Can you provide your usecase? The above use case considers a user error. @corneredrat are you looking on this issue?

sarabala1979 avatar Aug 15 '22 04:08 sarabala1979

Hi @sarabala1979 I have started looking into the issue. Will share a PR in a week or so

corneredrat avatar Aug 16 '22 03:08 corneredrat

@bengoldenberg Can you provide your usecase? The above use case considers a user error. @corneredrat are you looking on this issue?

when someone create incorrect workflow template structure, the argo ui doesnt show any option to select, and throw a http request error.

bengoldenberg avatar Aug 17 '22 10:08 bengoldenberg

Hi @sarabala1979 , in controller - steps.go link will adding following logic suffice to stop infinite loop?

if stepsCtx.scope.tmpl.Name == step.Template { 
	 // set node status as failed / invalid
}

This is what I get from first go-through of code; Please suggest a better alternative that would cover all usecases that may endup putting the workflow in infinite loop

corneredrat avatar Aug 21 '22 16:08 corneredrat

Work-around: delete bad workflow.

alexec avatar Sep 05 '22 20:09 alexec

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

stale[bot] avatar Oct 01 '22 06:10 stale[bot]

This issue has been closed due to inactivity. Feel free to re-open if you still encounter this issue.

stale[bot] avatar Oct 16 '22 00:10 stale[bot]

The example Workflow causes infinite recursion. Duplicate of #4180, #11499, #11497.

There is a recursion depth limit now as of #11646, which effectively fixes this. There may be better detection options for direct recursion (as in the example Workflow here), can follow #11497 for that specific variation.

agilgur5 avatar Sep 12 '23 21:09 agilgur5