Reloader icon indicating copy to clipboard operation
Reloader copied to clipboard

[BUG] Restarting a rollout with workloadRef crashes the operator pod

Open jorgelon opened this issue 1 year ago • 10 comments

Describe the bug There are 2 ways to create a rollout in argo rollouts:

  • The first one is with a single resource "rollout". This includes both logics: the deployment spec and the rollout spec. With the new 1.1.0 reloader release, restarting this rollout works great (deleting the pods and not creating a new replicaset)

  • The second one is with 2 resources. A rollout including the rollout spec and calling an existing deployment with spec.workloadRef. And the referenced deployment with the deployment spec. Restarting this rollout makes the controller crash.

To Reproduce Change the secret that reloads the rollout deployed via workloadRef

Expected behavior It works same ways as with a single rollout resource

Environment

  • Operator Version: v1.1.0
  • Kubernetes/OpenShift Version: 1.29.6

Log (via stern) reloader-7df5bc9f58-phgjj reloader E0918 08:33:15.113227 1 runtime.go:79] Observed a panic: runtime.boundsError{x:0, y:0, signed:true, code:0x0} (runtime error: index out of range [0] with length 0) reloader-7df5bc9f58-phgjj reloader goroutine 12 [running]: reloader-7df5bc9f58-phgjj reloader k8s.io/apimachinery/pkg/util/runtime.logPanic({0x19f31e0?, 0xc001912c48}) reloader-7df5bc9f58-phgjj reloader /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:75 +0x85 reloader-7df5bc9f58-phgjj reloader k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000cc6380?}) reloader-7df5bc9f58-phgjj reloader /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:49 +0x6b reloader-7df5bc9f58-phgjj reloader panic({0x19f31e0?, 0xc001912c48?}) reloader-7df5bc9f58-phgjj reloader /usr/local/go/src/runtime/panic.go:914 +0x21f reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.getContainerUsingResource({0x1c33320, 0x1c33308, 0x1c33328, 0x1c33310, 0x1c33318, 0x1c33380, 0x1c33330, {0x1b0f268, 0x7}}, {0x1da78c8, ...}, ...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/upgrade.go:382 +0x238 reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.updatePodAnnotations({0x1c33320, 0x1c33308, 0x1c33328, 0x1c33310, 0x1c33318, 0x1c33380, 0x1c33330, {0x1b0f268, 0x7}}, {0x1da78c8, ...}, ...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/upgrade.go:399 +0xec reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.invokeReloadStrategy({0x1c33320, 0x1c33308, 0x1c33328, 0x1c33310, 0x1c33318, 0x1c33380, 0x1c33330, {0x1b0f268, 0x7}}, {0x1da78c8, ...}, ...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/upgrade.go:392 +0x12d reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.PerformAction({{0x1dd0348, 0xc00090d1e0}, {0x1da5f00, 0x0}, {0x1da5f28, 0xc000c26c10}}, {{0xc000c37350, 0x18}, {0xc000986eb0, 0xa}, ...}, ...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/upgrade.go:222 +0x16b0 reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.rollingUpgrade({{0x1dd0348, 0xc00090d1e0}, {0x1da5f00, 0x0}, {0x1da5f28, 0xc000c26c10}}, {{0xc000c37350, 0x18}, {0xc000986eb0, 0xa}, ...}, ...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/upgrade.go:184 +0x128 reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.doRollingUpgrade({{0xc000c37350, 0x18}, {0xc000986eb0, 0xa}, 0xc0006b4cf0, {0x1b33cca, 0x23}, {0x1b30b76, 0x21}, {0xc000b3e3f0, ...}, ...}, ...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/upgrade.go:173 +0xb0c reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.ResourceUpdatedHandler.Handle({{0x1add080, 0xc000688140}, {0x1add080, 0xc001892b40}, {0xc00043c208, 0xc00043c220}, {0x7f056fc315e0, 0xc000437d00}}) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/update.go:32 +0x265 reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/controller.(*Controller).processNextItem(0xc000432d80) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/controller/controller.go:246 +0xd0 reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/controller.(*Controller).runWorker(...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/controller/controller.go:230 reloader-7df5bc9f58-phgjj reloader k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?) reloader-7df5bc9f58-phgjj reloader /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:226 +0x33 reloader-7df5bc9f58-phgjj reloader k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000114000?, {0x1da1620, 0xc00019baa0}, 0x1, 0xc000534000) reloader-7df5bc9f58-phgjj reloader /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:227 +0xaf reloader-7df5bc9f58-phgjj reloader k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x9bfa40?) reloader-7df5bc9f58-phgjj reloader /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:204 +0x7f reloader-7df5bc9f58-phgjj reloader k8s.io/apimachinery/pkg/util/wait.Until(0xc00035b7d0?, 0x9b8525?, 0xc000254600?) reloader-7df5bc9f58-phgjj reloader /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:161 +0x1e reloader-7df5bc9f58-phgjj reloader created by github.com/stakater/Reloader/internal/pkg/controller.(*Controller).Run in goroutine 66 reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/controller/controller.go:215 +0x259 reloader-7df5bc9f58-phgjj reloader panic: runtime error: index out of range [0] with length 0 [recovered] reloader-7df5bc9f58-phgjj reloader panic: runtime error: index out of range [0] with length 0 reloader-7df5bc9f58-phgjj reloader reloader-7df5bc9f58-phgjj reloader goroutine 12 [running]: reloader-7df5bc9f58-phgjj reloader k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000cc6380?}) reloader-7df5bc9f58-phgjj reloader /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:56 +0xcd reloader-7df5bc9f58-phgjj reloader panic({0x19f31e0?, 0xc001912c48?}) reloader-7df5bc9f58-phgjj reloader /usr/local/go/src/runtime/panic.go:914 +0x21f reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.getContainerUsingResource({0x1c33320, 0x1c33308, 0x1c33328, 0x1c33310, 0x1c33318, 0x1c33380, 0x1c33330, {0x1b0f268, 0x7}}, {0x1da78c8, ...}, ...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/upgrade.go:382 +0x238 reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.updatePodAnnotations({0x1c33320, 0x1c33308, 0x1c33328, 0x1c33310, 0x1c33318, 0x1c33380, 0x1c33330, {0x1b0f268, 0x7}}, {0x1da78c8, ...}, ...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/upgrade.go:399 +0xec reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.invokeReloadStrategy({0x1c33320, 0x1c33308, 0x1c33328, 0x1c33310, 0x1c33318, 0x1c33380, 0x1c33330, {0x1b0f268, 0x7}}, {0x1da78c8, ...}, ...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/upgrade.go:392 +0x12d reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.PerformAction({{0x1dd0348, 0xc00090d1e0}, {0x1da5f00, 0x0}, {0x1da5f28, 0xc000c26c10}}, {{0xc000c37350, 0x18}, {0xc000986eb0, 0xa}, ...}, ...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/upgrade.go:222 +0x16b0 reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.rollingUpgrade({{0x1dd0348, 0xc00090d1e0}, {0x1da5f00, 0x0}, {0x1da5f28, 0xc000c26c10}}, {{0xc000c37350, 0x18}, {0xc000986eb0, 0xa}, ...}, ...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/upgrade.go:184 +0x128 reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.doRollingUpgrade({{0xc000c37350, 0x18}, {0xc000986eb0, 0xa}, 0xc0006b4cf0, {0x1b33cca, 0x23}, {0x1b30b76, 0x21}, {0xc000b3e3f0, ...}, ...}, ...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/upgrade.go:173 +0xb0c reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/handler.ResourceUpdatedHandler.Handle({{0x1add080, 0xc000688140}, {0x1add080, 0xc001892b40}, {0xc00043c208, 0xc00043c220}, {0x7f056fc315e0, 0xc000437d00}}) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/handler/update.go:32 +0x265 reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/controller.(*Controller).processNextItem(0xc000432d80) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/controller/controller.go:246 +0xd0 reloader-7df5bc9f58-phgjj reloader github.com/stakater/Reloader/internal/pkg/controller.(*Controller).runWorker(...) reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/controller/controller.go:230 reloader-7df5bc9f58-phgjj reloader k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?) reloader-7df5bc9f58-phgjj reloader /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:226 +0x33 reloader-7df5bc9f58-phgjj reloader k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000114000?, {0x1da1620, 0xc00019baa0}, 0x1, 0xc000534000) reloader-7df5bc9f58-phgjj reloader /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:227 +0xaf reloader-7df5bc9f58-phgjj reloader k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x9bfa40?) reloader-7df5bc9f58-phgjj reloader /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:204 +0x7f reloader-7df5bc9f58-phgjj reloader k8s.io/apimachinery/pkg/util/wait.Until(0xc00035b7d0?, 0x9b8525?, 0xc000254600?) reloader-7df5bc9f58-phgjj reloader /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:161 +0x1e reloader-7df5bc9f58-phgjj reloader created by github.com/stakater/Reloader/internal/pkg/controller.(*Controller).Run in goroutine 66 reloader-7df5bc9f58-phgjj reloader /workspace/internal/pkg/controller/controller.go:215 +0x259

jorgelon avatar Sep 18 '24 08:09 jorgelon

it seems the code was written for spec.template.spec.containers and not for spec.workloadRef. This would be an enhancement i believe. PRs are welcome

MuneebAijaz avatar Sep 22 '24 17:09 MuneebAijaz

Hi @MuneebAijaz

I would like to work on this issue. Could you please assign it to me?

praddy26 avatar Sep 28 '24 20:09 praddy26

Seeing this issue as well -- any ideas when this can be fixed?

mikly-te avatar Nov 01 '24 19:11 mikly-te

@mikly-te I can attempt to fix this. But im kinda struggling to recreate the issue. Could you share the manifest files if possible to have a minimum reproducible example?

wololowarrior avatar Nov 02 '24 18:11 wololowarrior

@wololowarrior here's a stripped down version of the manifest files, although not sure if it's minimal or not.

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  annotations:
    secret.reloader.stakater.com/reload: webapps
  name: webapps
spec:
  replicas: 6
  selector:
    matchLabels:
      app: webapps
  strategy:
    canary:
      canaryService: webapps-canary
      scaleDownDelaySeconds: 300
      stableService: webapps
      steps:
        - setCanaryScale:
            weight: 100
        - setWeight: 0
        - pause:
            duration: 5m
        - setWeight: 100
        - pause:
            duration: 1m
      trafficRouting:
        istio:
          virtualService:
            name: webapps
            routes:
              - primary
  workloadRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapps
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapps
spec:
  replicas: 0
  selector:
    matchLabels:
      name: webapps
  template:
    metadata:
    spec:
      containers:
        - env:
            - name: CONFIG_APP_NAME
              value: webapps
          envFrom:
            - secretRef:
                name: webapps
          image: webapps
          name: application

mikly-te avatar Nov 04 '24 17:11 mikly-te

Thanks @mikly-te ! Appreciate it I'm also using a similar deployment approach.
For me reloader isnt crashing, but it isnt restarting the pods as well. I guess that's what the issue is. Will try to fix this.

Is reloader crashing for you?

wololowarrior avatar Nov 05 '24 05:11 wololowarrior

This makes the controller crash

apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment
  labels:
    app: deployment
spec:
  replicas: 0
  selector:
    matchLabels:
      app: deployment
  template:
    metadata:
      labels:
        app: deployment
    spec:
      containers:
      - image: docker.io/nginx:mainline-alpine3.20-slim
        name: nginx
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: deployment
  labels:
    app: deployment
  annotations:
    secret.reloader.stakater.com/reload: "deployment"
spec:
  selector:
    matchLabels:
      app: deployment
  workloadRef: 
    apiVersion: apps/v1
    kind: Deployment
    name: deployment
  strategy:
    blueGreen:
      activeService: deployment
      autoPromotionEnabled: false
      activeMetadata:
        labels:
          role: active
apiVersion: v1
kind: Service
metadata:
  name: deployment
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: http
apiVersion: v1
kind: Secret
metadata:
  name: deployment
stringData:
  VARIABLE: "whatever"
type: Opaque

jorgelon avatar Nov 06 '24 10:11 jorgelon

Line 418 is the crashing one: https://github.com/stakater/Reloader/blob/93fb083788f252ff2b7997d25a948f6a9d5f4da7/internal/pkg/handler/upgrade.go#L417-L419 I'm guessing that containers is an empty array, but I haven't yet figured out how that state is reached.

agurgel-te avatar Mar 20 '25 17:03 agurgel-te

And now I've figured it out. containers is set here:

https://github.com/stakater/Reloader/blob/93fb083788f252ff2b7997d25a948f6a9d5f4da7/internal/pkg/handler/upgrade.go#L387

ContainersFunc is GetRolloutContainers, which assumes the Rollout has a pod template.

When a Rollout uses workloadRef instead, the container array is empty. Indexing into that array causes the panic: index out of range [0] with length 0.

Okteto replay
default:reloader app> dlv debug . -- --is-Argo-Rollouts=true
Type 'help' for list of commands.
(dlv) b ./internal/pkg/handler/upgrade.go:387
Breakpoint 1 set at 0x21d00d7 for github.com/stakater/Reloader/internal/pkg/handler.getContainerUsingResource() ./internal/pkg/handler/upgrade.go:387
(dlv) c
INFO[0000] Environment: Kubernetes                      
INFO[0000] Starting Reloader                            
WARN[0000] KUBERNETES_NAMESPACE is unset, will detect changes in all namespaces. 
INFO[0000] created controller for: configMaps           
INFO[0000] Starting Controller to watch resource type: configMaps 
INFO[0000] created controller for: secrets              
INFO[0000] Starting Controller to watch resource type: secrets 

In another terminal:

k patch -p '{"data": {"ANTONIO_WAS_HERE": "Cg=="}}' secret/foo && \
k patch --type json -p '[{"op": "remove", "path": "/data/ANTONIO_WAS_HERE"}]' secret/foo

delve hits the breakpoint:

> [Breakpoint 1] github.com/stakater/Reloader/internal/pkg/handler.getContainerUsingResource() ./internal/pkg/handler/upgrade.go:387 (hits goroutine(63):1 total:1) (PC: 0x21d00d7)
   382:		return nil
   383:	}
   384:	
   385:	func getContainerUsingResource(upgradeFuncs callbacks.RollingUpgradeFuncs, item runtime.Object, config util.Config, autoReload bool) *v1.Container {
   386:		volumes := upgradeFuncs.VolumesFunc(item)
=> 387:		containers := upgradeFuncs.ContainersFunc(item)
   388:		initContainers := upgradeFuncs.InitContainersFunc(item)
   389:		var container *v1.Container
   390:		// Get the volumeMountName to find volumeMount in container
   391:		volumeMountName := getVolumeMountName(volumes, config.Type, config.ResourceName)
   392:		// Get the container with mounted configmap/secret
(dlv) s
> github.com/stakater/Reloader/internal/pkg/callbacks.GetRolloutContainers() ./internal/pkg/callbacks/rolling_upgrade.go:256 (PC: 0x21abd6a)
   251:	func GetDeploymentConfigContainers(item runtime.Object) []v1.Container {
   252:		return item.(*openshiftv1.DeploymentConfig).Spec.Template.Spec.Containers
   253:	}
   254:	
   255:	// GetRolloutContainers returns the containers of given rollout
=> 256:	func GetRolloutContainers(item runtime.Object) []v1.Container {
   257:		return item.(*argorolloutv1alpha1.Rollout).Spec.Template.Spec.Containers
   258:	}
   259:	
   260:	// GetDeploymentInitContainers returns the containers of given deployment
   261:	func GetDeploymentInitContainers(item runtime.Object) []v1.Container {
(dlv) p item.Spec.Template.Spec.Containers
[]k8s.io/api/core/v1.Container len: 0, cap: 0, nil

I don't have ideas for a fix, though:

func GetRolloutContainers(item runtime.Object) []v1.Container {
	rollout := item.(*argorolloutv1alpha1.Rollout)
	if (rollout.Spec.WorkloadRef != nil) {
		// There's no `kube.Clients` available here
		// so I can't follow the workload reference
		// to the object that has the pod template.
	}
	return rollout.Spec.Template.Spec.Containers
}

agurgel-te avatar Mar 20 '25 20:03 agurgel-te

@praddy26 (or @MuneebAijaz), any thoughts on the above?

agurgel-te avatar Mar 25 '25 22:03 agurgel-te

The Bug is present in 2.20

With loglevel trace:

reloader-597cc8775-5ndsp reloader E0806 12:56:15.889468 1 panic.go:115] "Observed a panic" panic="runtime error: index out of range [0] with length 0" panicGoValue="runtime.boundsError{x:0, y:0, signed:true, code:0x0}" stacktrace=

jorgelon avatar Aug 06 '25 12:08 jorgelon

Hi! Thanks all for the discussion and troubleshooting!

I will copy-paste my answer in the PR for visibility:

Hi and thank you for your contribution!

The fix proposed will work in theory but it will only result in the error NoContainerFound being generated when trying to reload n ArgoCD Rollout with a WorkloadRef, since we are not actually fetching the underlying workload being referenced.

This is for sure better than the operator crashing so i think it can be implemented as-is.

For the full fix we would probably need to fetch the referenced workload and find the container to restart from there. The issue then becomes: "How do we know if a deployment is being referenced from an ArgoCD Rollout?", this is for when getting an updated Deployment, we need to know if we should ignore it and trigger the reload via the argoCD Rollout, or if we should trigger the reload based on the "raw" Deployment.

This requires some planning for how to implement it and keep track of if a resource is referenced by an argoCD rollout. We are open to ideas and PRs for sure.

For now we will include a line in the documentation that ArgoCD Rollouts with WorkloadRef will not work as expected.

Felix-Stakater avatar Aug 11 '25 10:08 Felix-Stakater

Maybe adding an additional anotation to the rollout, only for these cases?

something like:

secret.reloader.stakater.com/workloadref: "NAME-OF-THE-DEPLOYMENT"

That annotation can tell reloader the deployment to check and then delete its pods. If a typical reloader annotation exists, a merged rollout/deployment is assumed

jorgelon avatar Nov 10 '25 13:11 jorgelon