argo-workflows icon indicating copy to clipboard operation
argo-workflows copied to clipboard

dag task failed with error msg "converting YAML to JSON: yaml: invalid map key"

Open thinkhoon opened this issue 1 year ago • 4 comments

Checklist

  • [x] Double-checked my configuration.
  • [x] Tested using the latest version.
  • [x] Used the Emissary executor.

Summary

What happened/what you expected to happen?

hello, i am using dag workflow in argo, i have a task A will output 3 files as output parameters and task B will use these param as it's input parameters. some times both task A and B work fine, but some times the task B will failed and show ”error converting YAML to JSON: yaml: invalid map key: map[interface {}]interface {}“, and the argo server ui ( see as the bellow pic ) shows the input param of task B had no render to the true value but only show the value template. which seems a same problem to https://github.com/argoproj/argo-workflows/issues/5960.

What version are you running?

3.3.8

Diagnostics

Paste the smallest workflow that reproduces the bug. We must be able to run the workflow.

metadata:
  name: epl-eftpjz89j
  generateName: epl-eftp
  namespace: argo
spec:
  templates:
    - name: diamond
      inputs: {}
      outputs: {}
      metadata: {}
      dag:
        tasks:
          - name: A
            template: create-queue
            arguments: {}
          - name: B
            template: run-label
            arguments:
              parameters:
                - name: sc_job
                  value: '{{tasks.A.outputs.parameters.sc_job}}'
                - name: para_job
                  value: '{{tasks.A.outputs.parameters.para_job}}'
            depends: createqueuet
    - name: create-queue
      inputs: {}
      outputs:
        parameters:
          - name: sc_job
            valueFrom:
              path: /sc_job
          - name: p_class
            valueFrom:
              path: /p_class
          - name: para_job
            valueFrom:
              path: /para_job
      metadata: {}
      container:
        name: ''
        image: 'ftp-autolabeling-dask:main'
        command:
          - python3.7
          - /app/src/workflow/util/create_task_queue.py
        resources: {}
        imagePullPolicy: Always
    
    - name: run-label
      inputs:
        parameters:
          - name: sc_job
          - name: para_job
      outputs: {}
      metadata: {}
      resource:
        action: create
        manifest: |
          apiVersion: batch/v1
          kind: Job
          metadata:
            generateName: epl-{{workflow.parameters.task_name}}-
          spec:
            ttlSecondsAfterFinished: 259200
            backoffLimit: 10000
            completions: {{ inputs.parameters.sc_job }}
            parallelism: {{ inputs.parameters.para_job }}
            template:
              metadata:
                annotations:
                creationTimestamp: null
              spec:
                nodeSelector:
                  cac_mode: cpu
                containers:
                  - name: container-p8gnk7
                    image: 'ftp-autolabeling-dask:main'
                    command:
                      - bash
                    args:
                      - /app/resource/script/start_epl_dis_task_runner.sh
                      - "{{workflow.parameters.batch_size}}"
                      - "{{workflow.parameters.batch_para_num}}"
                      - {{workflow.parameters.task_name}}
                      - {{workflow.parameters.branch}}
                    imagePullPolicy: Always
                restartPolicy: Never
                dnsPolicy: ClusterFirst
                serviceAccountName: default
                serviceAccount: default
                securityContext: {}
                schedulerName: default-scheduler
        setOwnerReference: true
        successCondition: 'status.succeeded == {{ inputs.parameters.sc_job  }}'
        failureCondition: status.failed > 10000
  entrypoint: diamond

image


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

thinkhoon avatar Aug 09 '22 02:08 thinkhoon

Thanks for submitting. Definitely seems like a bug if the behavior is inconsistent from run to run.

juliev0 avatar Aug 10 '22 20:08 juliev0

@thinkhoon Can you check the A task output during the failure? looks like A output is invalid to marshal it.

sarabala1979 avatar Aug 11 '22 15:08 sarabala1979

@thinkhoon Can you check the A task output during the failure? looks like A output is invalid to marshal it. hi sara, the output is absolutely right because the k8s job ( which need the output param of task A to define the success pods num ) is created correctly and run fine.

thinkhoon avatar Aug 12 '22 03:08 thinkhoon

Fix this depends: createqueuet, it is wrong.

alexec avatar Sep 05 '22 20:09 alexec

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

stale[bot] avatar Oct 01 '22 17:10 stale[bot]

This issue has been closed due to inactivity. Feel free to re-open if you still encounter this issue.

stale[bot] avatar Oct 16 '22 00:10 stale[bot]