[BUG] Restarting a rollout with workloadRef crashes the operator pod
Describe the bug There are 2 ways to create a rollout in argo rollouts:
-
The first one is with a single resource "rollout". This includes both logics: the deployment spec and the rollout spec. With the new 1.1.0 reloader release, restarting this rollout works great (deleting the pods and not creating a new replicaset)
-
The second one is with 2 resources. A rollout including the rollout spec and calling an existing deployment with spec.workloadRef. And the referenced deployment with the deployment spec. Restarting this rollout makes the controller crash.
To Reproduce Change the secret that reloads the rollout deployed via workloadRef
Expected behavior It works same ways as with a single rollout resource
Environment
- Operator Version: v1.1.0
- Kubernetes/OpenShift Version: 1.29.6
Log (via stern) reloader-7df5bc9f58-phgjj reloader E0918 08:33:15.113227 1 runtime.go:79] Observed a panic: runtime.boundsError{x:0, y:0, signed:true, code:0x0} (runtime error: index out of range [0] with length 0) reloader-7df5bc9f58-phgjj reloader goroutine 12 [running]: reloader-7df5bc9f58-phgjj reloader k8s.io/apimachinery/pkg/util/runtime.logPanic({0x19f31e0?, 0xc001912c48}) reloader-7df5bc9f58-phgjj reloader /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:75 +0x85 reloader-7df5bc9f58-phgjj reloader k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000cc6380?}) reloader-7df5bc9f58-phgjj reloader /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:49 +0x6b reloader-7df5bc9f58-phgjj reloader panic({0x19f31e0?, 0xc001912c48?}) reloader-7df5bc9f58-phgjj reloader /usr/local/go/src/runtime/panic.go:914 +0x21f reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.getContainerUsingResource({0x1c33320, 0x1c33308, 0x1c33328, 0x1c33310, 0x1c33318, 0x1c33380, 0x1c33330, {0x1b0f268, 0x7}}, {0x1da78c8, ...}, ...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/upgrade.go:382 +0x238 reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.updatePodAnnotations({0x1c33320, 0x1c33308, 0x1c33328, 0x1c33310, 0x1c33318, 0x1c33380, 0x1c33330, {0x1b0f268, 0x7}}, {0x1da78c8, ...}, ...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/upgrade.go:399 +0xec reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.invokeReloadStrategy({0x1c33320, 0x1c33308, 0x1c33328, 0x1c33310, 0x1c33318, 0x1c33380, 0x1c33330, {0x1b0f268, 0x7}}, {0x1da78c8, ...}, ...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/upgrade.go:392 +0x12d reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.PerformAction({{0x1dd0348, 0xc00090d1e0}, {0x1da5f00, 0x0}, {0x1da5f28, 0xc000c26c10}}, {{0xc000c37350, 0x18}, {0xc000986eb0, 0xa}, ...}, ...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/upgrade.go:222 +0x16b0 reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.rollingUpgrade({{0x1dd0348, 0xc00090d1e0}, {0x1da5f00, 0x0}, {0x1da5f28, 0xc000c26c10}}, {{0xc000c37350, 0x18}, {0xc000986eb0, 0xa}, ...}, ...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/upgrade.go:184 +0x128 reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.doRollingUpgrade({{0xc000c37350, 0x18}, {0xc000986eb0, 0xa}, 0xc0006b4cf0, {0x1b33cca, 0x23}, {0x1b30b76, 0x21}, {0xc000b3e3f0, ...}, ...}, ...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/upgrade.go:173 +0xb0c reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.ResourceUpdatedHandler.Handle({{0x1add080, 0xc000688140}, {0x1add080, 0xc001892b40}, {0xc00043c208, 0xc00043c220}, {0x7f056fc315e0, 0xc000437d00}}) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/update.go:32 +0x265 reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/controller.(*Controller).processNextItem(0xc000432d80) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/controller/controller.go:246 +0xd0 reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/controller.(*Controller).runWorker(...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/controller/controller.go:230 reloader-7df5bc9f58-phgjj reloader k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?) reloader-7df5bc9f58-phgjj reloader /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:226 +0x33 reloader-7df5bc9f58-phgjj reloader k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000114000?, {0x1da1620, 0xc00019baa0}, 0x1, 0xc000534000) reloader-7df5bc9f58-phgjj reloader /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:227 +0xaf reloader-7df5bc9f58-phgjj reloader k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x9bfa40?) reloader-7df5bc9f58-phgjj reloader /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:204 +0x7f reloader-7df5bc9f58-phgjj reloader k8s.io/apimachinery/pkg/util/wait.Until(0xc00035b7d0?, 0x9b8525?, 0xc000254600?) reloader-7df5bc9f58-phgjj reloader /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:161 +0x1e reloader-7df5bc9f58-phgjj reloader created by github.com/stakater/Reloader/internal/pkg/controller.(*Controller).Run in goroutine 66 reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/controller/controller.go:215 +0x259 reloader-7df5bc9f58-phgjj reloader panic: runtime error: index out of range [0] with length 0 [recovered] reloader-7df5bc9f58-phgjj reloader panic: runtime error: index out of range [0] with length 0 reloader-7df5bc9f58-phgjj reloader reloader-7df5bc9f58-phgjj reloader goroutine 12 [running]: reloader-7df5bc9f58-phgjj reloader k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000cc6380?}) reloader-7df5bc9f58-phgjj reloader /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:56 +0xcd reloader-7df5bc9f58-phgjj reloader panic({0x19f31e0?, 0xc001912c48?}) reloader-7df5bc9f58-phgjj reloader /usr/local/go/src/runtime/panic.go:914 +0x21f reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.getContainerUsingResource({0x1c33320, 0x1c33308, 0x1c33328, 0x1c33310, 0x1c33318, 0x1c33380, 0x1c33330, {0x1b0f268, 0x7}}, {0x1da78c8, ...}, ...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/upgrade.go:382 +0x238 reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.updatePodAnnotations({0x1c33320, 0x1c33308, 0x1c33328, 0x1c33310, 0x1c33318, 0x1c33380, 0x1c33330, {0x1b0f268, 0x7}}, {0x1da78c8, ...}, ...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/upgrade.go:399 +0xec reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.invokeReloadStrategy({0x1c33320, 0x1c33308, 0x1c33328, 0x1c33310, 0x1c33318, 0x1c33380, 0x1c33330, {0x1b0f268, 0x7}}, {0x1da78c8, ...}, ...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/upgrade.go:392 +0x12d reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.PerformAction({{0x1dd0348, 0xc00090d1e0}, {0x1da5f00, 0x0}, {0x1da5f28, 0xc000c26c10}}, {{0xc000c37350, 0x18}, {0xc000986eb0, 0xa}, ...}, ...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/upgrade.go:222 +0x16b0 reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.rollingUpgrade({{0x1dd0348, 0xc00090d1e0}, {0x1da5f00, 0x0}, {0x1da5f28, 0xc000c26c10}}, {{0xc000c37350, 0x18}, {0xc000986eb0, 0xa}, ...}, ...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/upgrade.go:184 +0x128 reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.doRollingUpgrade({{0xc000c37350, 0x18}, {0xc000986eb0, 0xa}, 0xc0006b4cf0, {0x1b33cca, 0x23}, {0x1b30b76, 0x21}, {0xc000b3e3f0, ...}, ...}, ...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/upgrade.go:173 +0xb0c reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.ResourceUpdatedHandler.Handle({{0x1add080, 0xc000688140}, {0x1add080, 0xc001892b40}, {0xc00043c208, 0xc00043c220}, {0x7f056fc315e0, 0xc000437d00}}) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/update.go:32 +0x265 reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/controller.(*Controller).processNextItem(0xc000432d80) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/controller/controller.go:246 +0xd0 reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/controller.(*Controller).runWorker(...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/controller/controller.go:230 reloader-7df5bc9f58-phgjj reloader k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?) reloader-7df5bc9f58-phgjj reloader /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:226 +0x33 reloader-7df5bc9f58-phgjj reloader k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000114000?, {0x1da1620, 0xc00019baa0}, 0x1, 0xc000534000) reloader-7df5bc9f58-phgjj reloader /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:227 +0xaf reloader-7df5bc9f58-phgjj reloader k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x9bfa40?) reloader-7df5bc9f58-phgjj reloader /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:204 +0x7f reloader-7df5bc9f58-phgjj reloader k8s.io/apimachinery/pkg/util/wait.Until(0xc00035b7d0?, 0x9b8525?, 0xc000254600?) reloader-7df5bc9f58-phgjj reloader /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:161 +0x1e reloader-7df5bc9f58-phgjj reloader created by github.com/stakater/Reloader/internal/pkg/controller.(*Controller).Run in goroutine 66 reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/controller/controller.go:215 +0x259
it seems the code was written for spec.template.spec.containers and not for spec.workloadRef. This would be an enhancement i believe.
PRs are welcome
Hi @MuneebAijaz
I would like to work on this issue. Could you please assign it to me?
Seeing this issue as well -- any ideas when this can be fixed?
@mikly-te I can attempt to fix this. But im kinda struggling to recreate the issue. Could you share the manifest files if possible to have a minimum reproducible example?
@wololowarrior here's a stripped down version of the manifest files, although not sure if it's minimal or not.
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
annotations:
secret.reloader.stakater.com/reload: webapps
name: webapps
spec:
replicas: 6
selector:
matchLabels:
app: webapps
strategy:
canary:
canaryService: webapps-canary
scaleDownDelaySeconds: 300
stableService: webapps
steps:
- setCanaryScale:
weight: 100
- setWeight: 0
- pause:
duration: 5m
- setWeight: 100
- pause:
duration: 1m
trafficRouting:
istio:
virtualService:
name: webapps
routes:
- primary
workloadRef:
apiVersion: apps/v1
kind: Deployment
name: webapps
apiVersion: apps/v1
kind: Deployment
metadata:
name: webapps
spec:
replicas: 0
selector:
matchLabels:
name: webapps
template:
metadata:
spec:
containers:
- env:
- name: CONFIG_APP_NAME
value: webapps
envFrom:
- secretRef:
name: webapps
image: webapps
name: application
Thanks @mikly-te ! Appreciate it
I'm also using a similar deployment approach.
For me reloader isnt crashing, but it isnt restarting the pods as well. I guess that's what the issue is. Will try to fix this.
Is reloader crashing for you?
This makes the controller crash
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment
labels:
app: deployment
spec:
replicas: 0
selector:
matchLabels:
app: deployment
template:
metadata:
labels:
app: deployment
spec:
containers:
- image: docker.io/nginx:mainline-alpine3.20-slim
name: nginx
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: deployment
labels:
app: deployment
annotations:
secret.reloader.stakater.com/reload: "deployment"
spec:
selector:
matchLabels:
app: deployment
workloadRef:
apiVersion: apps/v1
kind: Deployment
name: deployment
strategy:
blueGreen:
activeService: deployment
autoPromotionEnabled: false
activeMetadata:
labels:
role: active
apiVersion: v1
kind: Service
metadata:
name: deployment
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: http
apiVersion: v1
kind: Secret
metadata:
name: deployment
stringData:
VARIABLE: "whatever"
type: Opaque
Line 418 is the crashing one:
https://github.com/stakater/Reloader/blob/93fb083788f252ff2b7997d25a948f6a9d5f4da7/internal/pkg/handler/upgrade.go#L417-L419
I'm guessing that containers is an empty array, but I haven't yet figured out how that state is reached.
And now I've figured it out. containers is set here:
https://github.com/stakater/Reloader/blob/93fb083788f252ff2b7997d25a948f6a9d5f4da7/internal/pkg/handler/upgrade.go#L387
ContainersFunc is GetRolloutContainers, which assumes the Rollout has a pod template.
When a Rollout uses workloadRef instead, the container array is empty. Indexing into that array causes the panic: index out of range [0] with length 0.
Okteto replay
default:reloader app> dlv debug . -- --is-Argo-Rollouts=true
Type 'help' for list of commands.
(dlv) b ./internal/pkg/handler/upgrade.go:387
Breakpoint 1 set at 0x21d00d7 for github.com/stakater/Reloader/internal/pkg/handler.getContainerUsingResource() ./internal/pkg/handler/upgrade.go:387
(dlv) c
INFO[0000] Environment: Kubernetes
INFO[0000] Starting Reloader
WARN[0000] KUBERNETES_NAMESPACE is unset, will detect changes in all namespaces.
INFO[0000] created controller for: configMaps
INFO[0000] Starting Controller to watch resource type: configMaps
INFO[0000] created controller for: secrets
INFO[0000] Starting Controller to watch resource type: secrets
In another terminal:
k patch -p '{"data": {"ANTONIO_WAS_HERE": "Cg=="}}' secret/foo && \
k patch --type json -p '[{"op": "remove", "path": "/data/ANTONIO_WAS_HERE"}]' secret/foo
delve hits the breakpoint:
> [Breakpoint 1] github.com/stakater/Reloader/internal/pkg/handler.getContainerUsingResource() ./internal/pkg/handler/upgrade.go:387 (hits goroutine(63):1 total:1) (PC: 0x21d00d7)
382: return nil
383: }
384:
385: func getContainerUsingResource(upgradeFuncs callbacks.RollingUpgradeFuncs, item runtime.Object, config util.Config, autoReload bool) *v1.Container {
386: volumes := upgradeFuncs.VolumesFunc(item)
=> 387: containers := upgradeFuncs.ContainersFunc(item)
388: initContainers := upgradeFuncs.InitContainersFunc(item)
389: var container *v1.Container
390: // Get the volumeMountName to find volumeMount in container
391: volumeMountName := getVolumeMountName(volumes, config.Type, config.ResourceName)
392: // Get the container with mounted configmap/secret
(dlv) s
> github.com/stakater/Reloader/internal/pkg/callbacks.GetRolloutContainers() ./internal/pkg/callbacks/rolling_upgrade.go:256 (PC: 0x21abd6a)
251: func GetDeploymentConfigContainers(item runtime.Object) []v1.Container {
252: return item.(*openshiftv1.DeploymentConfig).Spec.Template.Spec.Containers
253: }
254:
255: // GetRolloutContainers returns the containers of given rollout
=> 256: func GetRolloutContainers(item runtime.Object) []v1.Container {
257: return item.(*argorolloutv1alpha1.Rollout).Spec.Template.Spec.Containers
258: }
259:
260: // GetDeploymentInitContainers returns the containers of given deployment
261: func GetDeploymentInitContainers(item runtime.Object) []v1.Container {
(dlv) p item.Spec.Template.Spec.Containers
[]k8s.io/api/core/v1.Container len: 0, cap: 0, nil
I don't have ideas for a fix, though:
func GetRolloutContainers(item runtime.Object) []v1.Container {
rollout := item.(*argorolloutv1alpha1.Rollout)
if (rollout.Spec.WorkloadRef != nil) {
// There's no `kube.Clients` available here
// so I can't follow the workload reference
// to the object that has the pod template.
}
return rollout.Spec.Template.Spec.Containers
}
@praddy26 (or @MuneebAijaz), any thoughts on the above?
The Bug is present in 2.20
With loglevel trace:
reloader-597cc8775-5ndsp reloader E0806 12:56:15.889468 1 panic.go:115] "Observed a panic" panic="runtime error: index out of range [0] with length 0" panicGoValue="runtime.boundsError{x:0, y:0, signed:true, code:0x0}" stacktrace=
Hi! Thanks all for the discussion and troubleshooting!
I will copy-paste my answer in the PR for visibility:
Hi and thank you for your contribution!
The fix proposed will work in theory but it will only result in the error NoContainerFound being generated when trying to reload n ArgoCD Rollout with a WorkloadRef, since we are not actually fetching the underlying workload being referenced.
This is for sure better than the operator crashing so i think it can be implemented as-is.
For the full fix we would probably need to fetch the referenced workload and find the container to restart from there. The issue then becomes: "How do we know if a deployment is being referenced from an ArgoCD Rollout?", this is for when getting an updated Deployment, we need to know if we should ignore it and trigger the reload via the argoCD Rollout, or if we should trigger the reload based on the "raw" Deployment.
This requires some planning for how to implement it and keep track of if a resource is referenced by an argoCD rollout. We are open to ideas and PRs for sure.
For now we will include a line in the documentation that ArgoCD Rollouts with WorkloadRef will not work as expected.
Maybe adding an additional anotation to the rollout, only for these cases?
something like:
secret.reloader.stakater.com/workloadref: "NAME-OF-THE-DEPLOYMENT"
That annotation can tell reloader the deployment to check and then delete its pods. If a typical reloader annotation exists, a merged rollout/deployment is assumed