argo-workflows icon indicating copy to clipboard operation
argo-workflows copied to clipboard

fix: make sure taskresult completed when mark node succeed when it has outputs

Open shuangkun opened this issue 5 months ago • 9 comments

When my cluster has lots of workflows, I meet some errors.

="Mark error node" error="failed to evaluate expression: cannot fetch steps-init-artifact from <nil> (1:6)\n | steps['init-artifact'].outputs.parameters['workflow_artifact_key']\n | .....^" namespace=argo nodeName="workflow-bhr9k[3].energy(0:0)[1].energy-steps(0:0)[3].comp-binding-energy-steps(0:0)[15]" workflow=workflow-bhr9k

When the number of workflows is not large, there is no such error.

My workflow has lots of template like this, the next step refer the output of the previous step. Like hello2a refer hello1 in parameter steps['hello1'].outputs.parameters['workflow_artifact_key'].

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: steps-
spec:
  entrypoint: hello-hello-hello
  arguments:
    parameters:
    - name: message1
      value: hello world
    - name: message2
      value: foobar
  # This spec contains two templates: hello-hello-hello and whalesay
  templates:
  - name: hello-hello-hello
    # Instead of just running a container
    # This template has a sequence of steps
    steps:
    - - name: hello1            # hello1 is run before the following steps
        continueOn: {}
        template: whalesay
        arguments:
          parameters:
          - name: message
            value: "hello1"
          - name: workflow_artifact_key
            value: "{{ workflow.parameters.message2}}"
    - - name: hello2a           # double dash => run after previous step
        template: whalesay
        arguments:
          parameters:
          - name: message
            value: "{{=steps['hello1'].outputs.parameters['workflow_artifact_key']}}"

  # This is the same template as from the previous example
  - name: whalesay
    metadata:
      annotations:
        k8s.aliyun.com/eci-spot-strategy: "SpotAsPriceGo"
    inputs:
      parameters:
      - name: message
    outputs:
      parameters:
      - name: workflow_artifact_key
        value: '{{workflow.name}}'
    script:
      image: python:alpine3.6
      command: [python]
      source: |
        import random
        i = random.randint(1, 100)
        print(i)

When I search the logs. I find the time of preStep(hello1)‘s “node changed” to succeed are earlier than "task-result changed". And this cause the hello2a's evaluate expression error. So I want to make sure taskresult completed when mark node succeed when it has outputs.

Motivation

Modifications

Verification

shuangkun avatar Jan 17 '24 14:01 shuangkun