argo-workflows icon indicating copy to clipboard operation
argo-workflows copied to clipboard

Failed to resolve {{workflow.annotations.xxx}} when annotations are defined in workflowDefaults

Open sylock opened this issue 1 year ago • 7 comments

Pre-requisites

  • [X] I have double-checked my configuration
  • [X] I can confirm the issues exists when I tested with :latest
  • [ ] I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

The issue

  • you define an annotation in the workflow-controller-configmap section workflowDefaults
  • you make a reference to this annotation in the workflow at any place with {{workflow.annotations.xxx}} where xxx is the name annotation
  • when trying to submit the workflowTemplate argo refuses it with the error: ERROR - Workflow <workflow.yaml>: templates.exit.steps failed to resolve {{workflow.annotations.xxx}}

This is really annoying because I wanted to use annotations to get some behavior in the exit-handler. I will use something else, but I think it should be supported.

Expected behavior

labels and annotations can be dynamic by definition, this should raise a warning but not an error.

Version

v3.4.9

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

---
# In the workflow-controller-configmap
  workflowDefaults: |
    metadata:
      annotations:
        xxx: hello test
---
# The workflow
metadata:
  name: delightful-tiger
  labels:
    example: 'true'
spec:
  entrypoint: argosay
  templates:
    - name: argosay
      inputs:
        parameters:
          - name: message
            value: '{{workflow.annotations.xxx}}'
      container:
        name: main
        image: argoproj/argosay:v2
        command:
          - /argosay
        args:
          - echo
          - '{{inputs.parameters.message}}'

Logs from the UI or argo-server

Bad Request: templates.argosay: failed to resolve {{workflow.annotations.xxx}}

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded

sylock avatar Jul 28 '23 09:07 sylock

Could you test if {{inputs.parameters.message}} works in steps or dag type workflow?

shmruin avatar Jul 28 '23 15:07 shmruin

If you want to explore this on your own in the code, {{workflow.annotations}} are supposed to be getting set here. You can see if the woc.execWf.Spec.WorkflowMetadata that it's using here is set as you expect.

To confirm, your ConfigMap looks something like this? (copied from here):

# This file describes the config settings available in the workflow controller configmap
apiVersion: v1
kind: ConfigMap
metadata:
  name: workflow-controller-configmap
data:
  # Default values that will apply to all Workflows from this controller, unless overridden on the Workflow-level
  workflowDefaults: |
    metadata:
      annotations:
        xxxx: yyy

juliev0 avatar Aug 03 '23 16:08 juliev0

Yes, my configmap is like you say. The thing is: the workflowDefaults are set at the runtime, when the pod is executed. But I get the error while pushing the workflow template through the API (from my CI/CD). At that moment, the workflow template does not have these values set. So the linter refuses it.

sylock avatar Aug 11 '23 13:08 sylock

I'm sorry to hear you ran into this bug. Do you know if this was an issue on v3.3?

Also, would this workaround be viable for you?

Create a workflow level exit handler and use kustomize/similar tool to apply the exit-handler to all WorkflowTemplates labeled with appType: apply-logic:

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: my-exit-handler
spec:
  onExit: exit-handler
  templates:
    - name: exit-handler
      container:
        image: alpine:latest
        command: [ sh, -c ]
        # TODO implement actual handling
        args: [ "echo workflow exited with error!" ]

Apply this patch based on a label: kustomization.yaml

patches:
  - path: exit-handler.yaml
    target:
      kind: WorkflowTemplate
      labelSelector: appType=apply-logic

Then label each app I want this applied to:

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: etp-logic-app
  labels:
    appType: apply-logic

caelan-io avatar Aug 31 '23 17:08 caelan-io

We're running into this issue trying to set the AWS account ID as a workflowDefaults annotation so that workflows can reference this annotation to determine which account they are running in, instead of fetching it at runtime (we have an Argo Workflows instance per account). More generally it seems like the workflowDefaults feature can't be used for annotations at all with this bug, since any references will fail at the validation stage.

I looked through the code and found that there are two separate places where workflow.annotations.* is populated. One is in validate.go and seems to run within the main server:

https://github.com/argoproj/argo-workflows/blob/595ab4bdecd57844120f401af4797d4be09698ca/workflow/validate/validate.go#L221-L225

And the other is in operator.go and seems to run within in the workflow controller:

https://github.com/argoproj/argo-workflows/blob/595ab4bdecd57844120f401af4797d4be09698ca/workflow/controller/operator.go#L654-L660

I patched the image to print a stack trace when this error is triggered, and obtained this stack trace, which shows the error is triggered from validation in the main server rather than the workflow controller. I noticed that the workflow controller populates workflow template parameters from defaults:

https://github.com/argoproj/argo-workflows/blob/595ab4bdecd57844120f401af4797d4be09698ca/workflow/controller/controller.go#L1128-L1133

But the validation logic in the main server seems to have no equivalent, leading to a discrepancy between the two paths that triggers the error.

What's even weirder is that there is a workaround. If we refer to the annotation as {{ workflow.annotations.awsAccountId }} instead of {{workflow.annotations.awsAccountId}} then everything works (even though these forms should obviously be equivalent). This suggests that there is a second bug (see https://github.com/argoproj/argo-workflows/issues/11767) which bypasses validation depending on the whitespace in the templating, which happens to work around the bug in this thread. This is further evidence that the workflow defaults work fine, it is just the validation logic that is overzealous.

I'm not sure what the right approach to fixing this is. It seems as if the main server needs to have access to configuration data from the workflow controller in order for the validation logic to work properly.

Note: This issue appears to be the same as https://github.com/argoproj/argo-workflows/issues/10946.

raxod502-plaid avatar Sep 06 '23 19:09 raxod502-plaid

Thank you for digging into this and providing more information @raxod502-plaid.

For all who are following, please add upvotes to this and #10946 to help a fix for this get prioritized. We will do our best to get to this issue. PRs welcome in the meantime if anyone is open to digging in further.

caelan-io avatar Sep 07 '23 13:09 caelan-io

Since https://github.com/argoproj/argo-workflows/issues/11767 is fixed now, there is no workaround for this issue; upgrading for 3.5.x breaks all workflows that use steps that rely on workflow defaults.

EDIT: turns out replacing simple tags like {{workflow.annotations.awsAccountId}} with expressions like {{= workflow.annotations.awsAccountId}} bypasses validation and things are working again 🥲

diversario avatar Jan 05 '24 13:01 diversario