argo-workflows icon indicating copy to clipboard operation
argo-workflows copied to clipboard

Workflow hangs, unable to proceed and mark completed when a sub dag failed to resolve the output parameter

Open tczhao opened this issue 10 months ago • 0 comments

Pre-requisites

  • [X] I have double-checked my configuration
  • [X] I can confirm the issue exists when I tested with :latest
  • [X] I have searched existing issues and could not find a match for this bug
  • [ ] I'd like to contribute the fix myself (see contributing guide)

What happened/what did you expect to happen?

Workflow should marked Errored/Failed when an inner dag template failed

image

Version

latest

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: loop-test-
spec:
  entrypoint: main
  templates:
  - name: main
    dag:
      tasks:
        - name: print-json-entry-print-exitcode
          template: print-json-entry-print-exitcode
          arguments:
            parameters:
            - name: index
              value: '0'
        - name: call-access-aggregate-output
          depends: "print-json-entry-print-exitcode"
          template: access-aggregate-output
          arguments:
            parameters:
            - name: aggregate-results
              value: '{{tasks.print-json-entry-print-exitcode.outputs.parameters.exit-code}}'
  - name: print-json-entry-print-exitcode
    inputs:
      parameters:
        - name: index
    outputs:
      parameters:
        - name: exit-code
          valueFrom:
            parameter: "{{tasks.print-exitcode.outputs.result}}"
    dag:
      tasks:
        - name: print-json-entry
          template: print-json-entry
          arguments:
            parameters:
            - name: index
              value: '{{inputs.parameters.index}}'
        - name: print-exitcode
          depends: "print-json-entry.Failed"
          template: print-exitcode
          arguments:
            parameters:
            - name: exitcode
              value: '{{tasks.print-json-entry.exitCode}}'
  - name: print-json-entry
    inputs:
      parameters:
      - name: index
    container:
      image: alpine:latest
      command: [sh, -c]
      args: ["echo intentional failure; exit {{inputs.parameters.index}}"]
  - name: access-aggregate-output
    inputs:
      parameters:
      - name: aggregate-results
        value: 'no-value'
    script:
      image: alpine:latest
      command: [sh]
      source: |
        echo 'inputs.parameters.aggregate-results: "{{inputs.parameters.aggregate-results}}"'
  - name: print-exitcode
    inputs:
      parameters:
      - name: exitcode
        value: ''
    script:
      image: alpine:latest
      command: [sh]
      source: |
        echo '{{inputs.parameters.exitcode}}'

Logs from the workflow controller

github says comment too long but you can submit the workflow and reproduce it
here is the last few lines from controller log


time="2024-04-02T04:30:00.001Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (workflow-template-whalesay-template/whalesay-template#false)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Found stored template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (workflow-template-whalesay-template/whalesay-template#false)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (workflow-template-whalesay-template/whalesay-template#false)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Found stored template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (workflow-template-whalesay-template/whalesay-template#false)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (workflow-template-whalesay-template/whalesay-template#false)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Found stored template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (workflow-template-whalesay-template/whalesay-template#false)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (workflow-template-whalesay-template/whalesay-template#false)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Found stored template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (workflow-template-whalesay-template/whalesay-template#false)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (workflow-template-whalesay-template/whalesay-template#false)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Found stored template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (workflow-template-whalesay-template/whalesay-template#false)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Getting the template by name: whalesay" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (whalesay)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (whalesay)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (whalesay)"
time="2024-04-02T04:30:00.001Z" level=debug msg="Getting the template by name: whalesay" base="*v1alpha1.Workflow (namespace=,name=)" tmpl="*v1alpha1.WorkflowStep (whalesay)"
time="2024-04-02T04:30:00.001Z" level=info msg="delightful-poochenheimer is suspended, skipping execution" namespace=argo workflow=delightful-poochenheimer
time="2024-04-02T04:30:00.004Z" level=debug msg="Patch cronworkflows 200"
time="2024-04-02T04:30:00.004Z" level=debug msg="Patch cronworkflows 200"
time="2024-04-02T04:30:00.112Z" level=info msg="cleaning up pod" action=killContainers key=argo/loop-test-pwszv-print-json-entry-2612912699/killContainers
time="2024-04-02T04:30:00.273Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=286082 namespace=argo workflow=loop-test-pwszv
time="2024-04-02T04:30:00.274Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=1 workflow=loop-test-pwszv
time="2024-04-02T04:30:00.274Z" level=debug msg="task result:\n&WorkflowTaskResult{ObjectMeta:{loop-test-pwszv-2612912699  argo  8c15dd8c-4999-4df5-a2f9-d2e96e14f732 286074 2 2024-04-02 04:29:53 +0000 UTC <nil> <nil> map[workflows.argoproj.io/report-outputs-completed:true workflows.argoproj.io/workflow:loop-test-pwszv] map[] [{argoproj.io/v1alpha1 Workflow loop-test-pwszv 5e453b4e-ca11-4530-b2ba-8d2e28a2072f <nil> <nil>}] [] [{argoexec Update argoproj.io/v1alpha1 2024-04-02 04:29:56 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:labels\":{\".\":{},\"f:workflows.argoproj.io/report-outputs-completed\":{},\"f:workflows.argoproj.io/workflow\":{}},\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"5e453b4e-ca11-4530-b2ba-8d2e28a2072f\\\"}\":{}}},\"f:outputs\":{\".\":{},\"f:artifacts\":{}}} }]},NodeResult:NodeResult{Phase:,Message:,Outputs:&Outputs{Parameters:[]Parameter{},Artifacts:[]Artifact{Artifact{Name:main-logs,Path:,Mode:nil,From:,ArtifactLocation:ArtifactLocation{ArchiveLogs:nil,S3:&S3Artifact{S3Bucket:S3Bucket{Endpoint:,Bucket:,Region:,Insecure:nil,AccessKeySecret:nil,SecretKeySecret:nil,RoleARN:,UseSDKCreds:false,CreateBucketIfNotPresent:nil,EncryptionOptions:nil,CASecret:nil,},Key:loop-test-pwszv/loop-test-pwszv-print-json-entry-2612912699/main.log,},Git:nil,HTTP:nil,Artifactory:nil,HDFS:nil,Raw:nil,OSS:nil,GCS:nil,Azure:nil,},GlobalName:,Archive:nil,Optional:false,SubPath:,RecurseMode:false,FromExpression:,ArtifactGC:nil,Deleted:false,},},Result:nil,ExitCode:nil,},Progress:,},}" namespace=argo workflow=loop-test-pwszv
time="2024-04-02T04:30:00.274Z" level=debug msg="task result name:\nloop-test-pwszv-2612912699" namespace=argo workflow=loop-test-pwszv
time="2024-04-02T04:30:00.274Z" level=debug msg="Marking task result complete loop-test-pwszv-2612912699" namespace=argo workflow=loop-test-pwszv
time="2024-04-02T04:30:00.274Z" level=info msg="task-result changed" namespace=argo nodeID=loop-test-pwszv-2612912699 workflow=loop-test-pwszv
time="2024-04-02T04:30:00.274Z" level=debug msg="Skipping artifact GC" namespace=argo workflow=loop-test-pwszv
time="2024-04-02T04:30:00.274Z" level=debug msg="Evaluating node loop-test-pwszv: template: *v1alpha1.WorkflowStep (main), boundaryID: " namespace=argo workflow=loop-test-pwszv
time="2024-04-02T04:30:00.274Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=argo,name=loop-test-pwszv)" tmpl="*v1alpha1.WorkflowStep (main)"
time="2024-04-02T04:30:00.274Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=argo,name=loop-test-pwszv)" tmpl="*v1alpha1.WorkflowStep (main)"
time="2024-04-02T04:30:00.274Z" level=debug msg="Getting the template by name: main" base="*v1alpha1.Workflow (namespace=argo,name=loop-test-pwszv)" tmpl="*v1alpha1.WorkflowStep (main)"
time="2024-04-02T04:30:00.274Z" level=debug msg="unresolved is allowed " error=unresolved
time="2024-04-02T04:30:00.274Z" level=debug msg="unresolved is allowed " error=unresolved
time="2024-04-02T04:30:00.274Z" level=debug msg="Executing node loop-test-pwszv of DAG is Running" namespace=argo workflow=loop-test-pwszv
time="2024-04-02T04:30:00.274Z" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=argo,name=loop-test-pwszv)" tmpl="*v1alpha1.DAGTask (access-aggregate-output)"
time="2024-04-02T04:30:00.274Z" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=argo,name=loop-test-pwszv)" tmpl="*v1alpha1.DAGTask (access-aggregate-output)"
time="2024-04-02T04:30:00.274Z" level=debug msg="Getting the template by name: access-aggregate-output" base="*v1alpha1.Workflow (namespace=argo,name=loop-test-pwszv)" tmpl="*v1alpha1.DAGTask (access-aggregate-output)"
time="2024-04-02T04:30:00.274Z" level=debug msg="unresolved is allowed " error=unresolved
time="2024-04-02T04:30:00.274Z" level=debug msg="unresolved is allowed " error=unresolved
time="2024-04-02T04:30:00.274Z" level=debug msg="unresolved is allowed " error=unresolved
time="2024-04-02T04:30:00.274Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=loop-test-pwszv
time="2024-04-02T04:30:00.274Z" level=info msg=reconcileAgentPod namespace=argo workflow=loop-test-pwszv


### Logs from in your workflow's wait container

```text
kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded

tczhao avatar Apr 02 '24 04:04 tczhao