Can't use template interpolation in mutex name in DAG
Checklist
- [x] Double-checked my configuration.
- [x] Tested using the latest version.
- [x] Used the Emissary executor.
Summary
Related merged PR: https://github.com/argoproj/argo-workflows/pull/3957
What happened/what you expected to happen?
As a workaround for this issue: https://github.com/argoproj/argo-workflows/issues/9394, we tried using a sprig function to replace the slash in our interpolated mutex name, since using a slash would result in a Workflow operation error. We tried this:
'{{=sprig.replace("/", "-", inputs.parameters.repository)}}'
But this also resulted in the same Workflow operation error, but at a different point in the process than the other issue (truncated here):
time="2022-08-18T01:33:52.861Z" level=error msg="Recovered from panic" namespace=provisioner r="bug: lock that was just encoded does not match encoding" stack="goroutine 409 [running]:\nruntime/debug.Stack()\n\t/usr/lo
What version are you running?
Argo Workflows 3.3.8
Diagnostics
Paste the smallest workflow that reproduces the bug. We must be able to run the workflow.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
name: unique-name-for-grep
namespace: default
spec:
entrypoint: diamond
templates:
- name: diamond
dag:
tasks:
- name: A
template: mutex
arguments:
parameters:
- name: message
value: foo/bar
- name: B
depends: A
template: mutex
arguments:
parameters:
- name: message
value: foo/bar
- name: mutex
synchronization:
mutex:
name: '{{=sprig.replace("/", "-", inputs.parameters.message)}}'
inputs:
parameters:
- name: message
container:
image: alpine:3.7
command: [sh, -c, "echo {{inputs.parameters.message}}"]
I narrowed this down to a Release call being made without template rendering here: https://github.com/argoproj/argo-workflows/blob/0a2d71410c8288bcb91cc1edf14074c1212e14a7/workflow/controller/dag.go#L373
I've also PRed a proposed fix: https://github.com/argoproj/argo-workflows/pull/9405
# Logs from the workflow controller:
$ kubectl logs -n argo deploy/workflow-controller | grep unique-name-for-grep
Defaulted container "workflow-controller" out of: workflow-controller, fluent-bit
time="2022-08-19T06:15:39.620Z" level=info msg="Processing workflow" namespace=default workflow=unique-name-for-grep
time="2022-08-19T06:15:39.626Z" level=info msg="Updated phase -> Running" namespace=default workflow=unique-name-for-grep
time="2022-08-19T06:15:39.626Z" level=info msg="DAG node unique-name-for-grep initialized Running" namespace=default workflow=unique-name-for-grep
time="2022-08-19T06:15:39.626Z" level=info msg="All of node unique-name-for-grep.A dependencies [] completed" namespace=default workflow=unique-name-for-grep
time="2022-08-19T06:15:39.626Z" level=info msg="default/Mutex/foo-bar acquired by default/unique-name-for-grep/unique-name-for-grep-899913450 " mutex=default/Mutex/foo-bar
time="2022-08-19T06:15:39.626Z" level=info msg="Node unique-name-for-grep.A acquired synchronization lock" namespace=default workflow=unique-name-for-grep
time="2022-08-19T06:15:39.626Z" level=info msg="Pod node unique-name-for-grep-899913450 initialized Pending" namespace=default workflow=unique-name-for-grep
time="2022-08-19T06:15:39.705Z" level=info msg="Created pod: unique-name-for-grep.A (unique-name-for-grep-899913450)" namespace=default workflow=unique-name-for-grep
time="2022-08-19T06:15:39.705Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=unique-name-for-grep
time="2022-08-19T06:15:39.705Z" level=info msg=reconcileAgentPod namespace=default workflow=unique-name-for-grep
time="2022-08-19T06:15:39.717Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=756240871 workflow=unique-name-for-grep
time="2022-08-19T06:15:49.706Z" level=info msg="Processing workflow" namespace=default workflow=unique-name-for-grep
time="2022-08-19T06:15:49.706Z" level=info msg="Task-result reconciliation" namespace=default numObjs=0 workflow=unique-name-for-grep
time="2022-08-19T06:15:49.706Z" level=info msg="node changed" new.message= new.phase=Succeeded new.progress=0/1 nodeID=unique-name-for-grep-899913450 old.message= old.phase=Pending old.progress=0/1
time="2022-08-19T06:15:49.707Z" level=error msg="Recovered from panic" namespace=default r="bug: lock that was just encoded does not match encoding" stack="goroutine 388 [running]:\nruntime/debug.Stack()\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x65\ngithub.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).operate.func2()\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:194 +0xd4\npanic({0x1bf9440, 0x225c110})\n\t/usr/local/go/src/runtime/panic.go:1047 +0x266\ngithub.com/argoproj/argo-workflows/v3/workflow/sync.(*LockName).ValidateEncoding(0xc1487a6440, {0xc00c733450, 0x45})\n\t/go/src/github.com/argoproj/argo-workflows/workflow/sync/lock_name.go:101 +0x14b\ngithub.com/argoproj/argo-workflows/v3/workflow/sync.(*LockName).EncodeName(0xc1487a6440)\n\t/go/src/github.com/argoproj/argo-workflows/workflow/sync/lock_name.go:74 +0x10d\ngithub.com/argoproj/argo-workflows/v3/workflow/sync.(*Manager).Release(0xc000998420, 0xc00d6b6000, {0xc016a7a080, 0x0}, 0xc0101915a0)\n\t/go/src/github.com/argoproj/argo-workflows/workflow/sync/sync_manager.go:190 +0x138\ngithub.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).executeDAGTask(0xc001382b40, {0x22b0668, 0xc000056030}, 0xc0023eb6c0, {0x331d148, 0x1})\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/dag.go:373 +0x6c6\ngithub.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).executeDAGTask(0xc001382b40, {0x22b0668, 0xc000056030}, 0xc0023eb6c0, {0x331d150, 0x1})\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/dag.go:438 +0x1f25\ngithub.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).executeDAG(0xc001382b40, {0x22b0668, 0xc000056030}, {0xc0123dcd38, 0x1e8da00}, 0xc1487a6340, {0xc016a7a680, 0x1a}, 0xc0b5396d80, {0x2292360, ...}, ...)\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/dag.go:244 +0x433\ngithub.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).executeTemplate(0xc001382b40, {0x22b0668, 0xc000056030}, {0xc0123dcd38, 0x14}, {0x2292360, 0xc001382d80}, 0xc00169daa0, {{0x0, 0x0, ...}, ...}, ...)\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:1889 +0x232c\ngithub.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).operate(0xc001382b40, {0x22b0668, 0xc000056030})\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:350 +0x16a8\ngithub.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).processNextItem(0xc000580c00, {0x22b0668, 0xc000056030})\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:756 +0x8ee\ngithub.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).runWorker(0x0)\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:678 +0x9e\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7f4ac4319d00)\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155 +0x67\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0, {0x226dbe0, 0xc000c7e000}, 0x1, 0xc00015e060)\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156 +0xb6\nk8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0, 0x3b9aca00, 0x0, 0x0, 0x0)\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133 +0x89\nk8s.io/apimachinery/pkg/util/wait.Until(0x0, 0x0, 0x0)\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90 +0x25\ncreated by github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:294 +0x1a6c\n" workflow=unique-name-for-grep
time="2022-08-19T06:15:49.707Z" level=info msg="Updated phase Running -> Error" namespace=default workflow=unique-name-for-grep
time="2022-08-19T06:15:49.707Z" level=info msg="Updated message -> bug: lock that was just encoded does not match encoding" namespace=default workflow=unique-name-for-grep
time="2022-08-19T06:15:49.707Z" level=info msg="Marking workflow completed" namespace=default workflow=unique-name-for-grep
time="2022-08-19T06:15:49.707Z" level=info msg="Checking daemoned children of " namespace=default workflow=unique-name-for-grep
time="2022-08-19T06:15:49.707Z" level=info msg="Lock has been released by default/unique-name-for-grep/unique-name-for-grep-899913450. Available locks: 1" mutex=default/Mutex/foo-bar
time="2022-08-19T06:15:49.707Z" level=info msg="default/unique-name-for-grep/unique-name-for-grep-899913450 released a lock from default/Mutex/foo-bar"
time="2022-08-19T06:15:49.707Z" level=info msg="Released all acquired locks" key=unique-name-for-grep namespace=default workflow=unique-name-for-grep
time="2022-08-19T06:15:49.712Z" level=info msg="cleaning up pod" action=deletePod key=default/unique-name-for-grep-1340600742-agent/deletePod
time="2022-08-19T06:15:49.718Z" level=info msg="default/unique-name-for-grep/unique-name-for-grep-899913450 released a lock from default/Mutex/foo-bar"
time="2022-08-19T06:15:49.718Z" level=info msg="Queueing Error workflow default/unique-name-for-grep for delete in 120h0m0s due to TTL"
time="2022-08-19T06:15:49.720Z" level=info msg="Workflow update successful" namespace=default phase=Error resourceVersion=756241285 workflow=unique-name-for-grep
time="2022-08-19T06:15:49.732Z" level=info msg="cleaning up pod" action=labelPodCompleted key=default/unique-name-for-grep-899913450/labelPodCompleted
# If the workflow's pods have not been created, you can skip the rest of the diagnostics.
# The workflow's pods that are problematic:
$ kubectl get pod -o yaml -l workflows.argoproj.io/workflow=unique-name-for-grep,workflow.argoproj.io/phase!=Succeeded
apiVersion: v1
items:
- apiVersion: v1
kind: Pod
metadata:
annotations:
kubectl.kubernetes.io/default-container: main
kubernetes.io/psp: restricted
seccomp.security.alpha.kubernetes.io/pod: runtime/default
workflows.argoproj.io/node-id: unique-name-for-grep-899913450
workflows.argoproj.io/node-name: unique-name-for-grep.A
creationTimestamp: "2022-08-19T06:15:39Z"
labels:
workflows.argoproj.io/completed: "true"
workflows.argoproj.io/workflow: unique-name-for-grep
name: unique-name-for-grep-899913450
namespace: default
ownerReferences:
- apiVersion: argoproj.io/v1alpha1
blockOwnerDeletion: true
controller: true
kind: Workflow
name: unique-name-for-grep
uid: 366a4af7-1696-48e4-bef2-7a07616c26c2
resourceVersion: "756241290"
uid: 7de3d321-8a39-424f-833e-07d09c80b66c
spec:
activeDeadlineSeconds: 28799
containers:
- command:
- argoexec
- wait
- --loglevel
- info
env:
- name: ARGO_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: ARGO_POD_UID
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.uid
- name: ARGO_CONTAINER_RUNTIME_EXECUTOR
value: emissary
- name: GODEBUG
value: x509ignoreCN=0
- name: ARGO_WORKFLOW_NAME
value: unique-name-for-grep
- name: ARGO_CONTAINER_NAME
value: wait
- name: ARGO_TEMPLATE
value: '{"name":"mutex","inputs":{"parameters":[{"name":"message","value":"foo/bar"}]},"outputs":{},"metadata":{},"container":{"name":"","image":"alpine:3.7","command":["sh","-c","echo
foo/bar"],"resources":{}},"synchronization":{"mutex":{"name":"foo-bar"}}}'
- name: ARGO_NODE_ID
value: unique-name-for-grep-899913450
- name: ARGO_INCLUDE_SCRIPT_OUTPUT
value: "false"
- name: ARGO_DEADLINE
value: "2022-08-19T14:15:39Z"
- name: ARGO_PROGRESS_FILE
value: /var/run/argo/progress
- name: ARGO_PROGRESS_PATCH_TICK_DURATION
value: 1m0s
- name: ARGO_PROGRESS_FILE_TICK_DURATION
value: 3s
image: quay.io/argoproj/argoexec:v3.3.8
imagePullPolicy: IfNotPresent
name: wait
resources: {}
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- FSETID
- NET_RAW
- SETFCAP
- SETPCAP
- SYS_PTRACE
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/argo
name: var-run-argo
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-b7pd9
readOnly: true
- command:
- /var/run/argo/argoexec
- emissary
- --
- sh
- -c
- echo foo/bar
env:
- name: ARGO_CONTAINER_NAME
value: main
- name: ARGO_TEMPLATE
value: '{"name":"mutex","inputs":{"parameters":[{"name":"message","value":"foo/bar"}]},"outputs":{},"metadata":{},"container":{"name":"","image":"alpine:3.7","command":["sh","-c","echo
foo/bar"],"resources":{}},"synchronization":{"mutex":{"name":"foo-bar"}}}'
- name: ARGO_NODE_ID
value: unique-name-for-grep-899913450
- name: ARGO_INCLUDE_SCRIPT_OUTPUT
value: "false"
- name: ARGO_DEADLINE
value: "2022-08-19T14:15:39Z"
- name: ARGO_PROGRESS_FILE
value: /var/run/argo/progress
- name: ARGO_PROGRESS_PATCH_TICK_DURATION
value: 1m0s
- name: ARGO_PROGRESS_FILE_TICK_DURATION
value: 3s
image: alpine:3.7
imagePullPolicy: IfNotPresent
name: main
resources: {}
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- FSETID
- NET_RAW
- SETFCAP
- SETPCAP
- SYS_PTRACE
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/argo
name: var-run-argo
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-b7pd9
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
initContainers:
- command:
- argoexec
- init
- --loglevel
- info
env:
- name: ARGO_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: ARGO_POD_UID
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.uid
- name: ARGO_CONTAINER_RUNTIME_EXECUTOR
value: emissary
- name: GODEBUG
value: x509ignoreCN=0
- name: ARGO_WORKFLOW_NAME
value: unique-name-for-grep
- name: ARGO_CONTAINER_NAME
value: init
- name: ARGO_TEMPLATE
value: '{"name":"mutex","inputs":{"parameters":[{"name":"message","value":"foo/bar"}]},"outputs":{},"metadata":{},"container":{"name":"","image":"alpine:3.7","command":["sh","-c","echo
foo/bar"],"resources":{}},"synchronization":{"mutex":{"name":"foo-bar"}}}'
- name: ARGO_NODE_ID
value: unique-name-for-grep-899913450
- name: ARGO_INCLUDE_SCRIPT_OUTPUT
value: "false"
- name: ARGO_DEADLINE
value: "2022-08-19T14:15:39Z"
- name: ARGO_PROGRESS_FILE
value: /var/run/argo/progress
- name: ARGO_PROGRESS_PATCH_TICK_DURATION
value: 1m0s
- name: ARGO_PROGRESS_FILE_TICK_DURATION
value: 3s
image: quay.io/argoproj/argoexec:v3.3.8
imagePullPolicy: IfNotPresent
name: init
resources: {}
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- FSETID
- NET_RAW
- SETFCAP
- SETPCAP
- SYS_PTRACE
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/argo
name: var-run-argo
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-b7pd9
readOnly: true
nodeName: ip-10-89-151-88.us-west-2.compute.internal
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Never
schedulerName: default-scheduler
securityContext:
fsGroup: 1
seccompProfile:
type: RuntimeDefault
supplementalGroups:
- 1
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- emptyDir: {}
name: var-run-argo
- name: kube-api-access-b7pd9
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2022-08-19T06:15:42Z"
reason: PodCompleted
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2022-08-19T06:15:39Z"
reason: PodCompleted
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2022-08-19T06:15:39Z"
reason: PodCompleted
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2022-08-19T06:15:39Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://5187cd95717d296321e02671ce80568ac5c46265f4c6a761578250356d6d2047
image: alpine:3.7
imageID: docker-pullable://alpine@sha256:8421d9a84432575381bfabd248f1eb56f3aa21d9d7cd2511583c68c9b7511d10
lastState: {}
name: main
ready: false
restartCount: 0
started: false
state:
terminated:
containerID: docker://5187cd95717d296321e02671ce80568ac5c46265f4c6a761578250356d6d2047
exitCode: 0
finishedAt: "2022-08-19T06:15:44Z"
reason: Completed
startedAt: "2022-08-19T06:15:44Z"
- containerID: docker://8ef681d306bc61fd4fae26af9e811c7745d92e350edcc0cae7bac837a7d919ed
image: quay.io/argoproj/argoexec:v3.3.8
imageID: docker-pullable://quay.io/argoproj/argoexec@sha256:ba2be5594962f60f5a4ece6b16b9fc359280fff983cabef35ac18edcaeb97609
lastState: {}
name: wait
ready: false
restartCount: 0
started: false
state:
terminated:
containerID: docker://8ef681d306bc61fd4fae26af9e811c7745d92e350edcc0cae7bac837a7d919ed
exitCode: 0
finishedAt: "2022-08-19T06:15:44Z"
reason: Completed
startedAt: "2022-08-19T06:15:42Z"
hostIP: 10.89.151.88
initContainerStatuses:
- containerID: docker://c9579aeede3d346a4c4e5a1174568d8f7d95d502f79f57bb66efaf287e1310c7
image: quay.io/argoproj/argoexec:v3.3.8
imageID: docker-pullable://quay.io/argoproj/argoexec@sha256:ba2be5594962f60f5a4ece6b16b9fc359280fff983cabef35ac18edcaeb97609
lastState: {}
name: init
ready: true
restartCount: 0
state:
terminated:
containerID: docker://c9579aeede3d346a4c4e5a1174568d8f7d95d502f79f57bb66efaf287e1310c7
exitCode: 0
finishedAt: "2022-08-19T06:15:41Z"
reason: Completed
startedAt: "2022-08-19T06:15:41Z"
phase: Succeeded
podIP: 100.64.191.199
podIPs:
- ip: 100.64.191.199
qosClass: BestEffort
startTime: "2022-08-19T06:15:39Z"
kind: List
metadata:
resourceVersion: ""
# Logs from in your workflow's wait container, something like:
$ kubectl logs -c wait -l workflows.argoproj.io/workflow=unique-name-for-grep,workflow.argoproj.io/phase!=Succeeded
time="2022-08-19T06:15:42.749Z" level=info msg="Creating a emissary executor"
time="2022-08-19T06:15:42.749Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2022-08-19T06:15:42.749Z" level=info msg="Executor initialized" deadline="2022-08-19 14:15:39 +0000 UTC" includeScriptOutput=false namespace=default podName=unique-name-for-grep-899913450 template="{\"name\":\"mutex\",\"inputs\":{\"parameters\":[{\"name\":\"message\",\"value\":\"foo/bar\"}]},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"alpine:3.7\",\"command\":[\"sh\",\"-c\",\"echo foo/bar\"],\"resources\":{}},\"synchronization\":{\"mutex\":{\"name\":\"foo-bar\"}}}" version="&Version{Version:v3.3.8,BuildDate:2022-06-23T23:46:23Z,GitCommit:621b0d1a8e09634666ebe403ee7b8fc29db1dc4e,GitTag:v3.3.8,GitTreeState:clean,GoVersion:go1.17.11,Compiler:gc,Platform:linux/amd64,}"
time="2022-08-19T06:15:42.749Z" level=info msg="Starting deadline monitor"
time="2022-08-19T06:15:44.750Z" level=info msg="Main container completed"
time="2022-08-19T06:15:44.750Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2022-08-19T06:15:44.750Z" level=info msg="No output parameters"
time="2022-08-19T06:15:44.750Z" level=info msg="No output artifacts"
time="2022-08-19T06:15:44.750Z" level=info msg="Killing sidecars []"
time="2022-08-19T06:15:44.751Z" level=info msg="Alloc=5817 TotalAlloc=10411 Sys=19410 NumGC=3 Goroutines=7"
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.