piped panics when using `K8S_PRIMARY_ROLLOUT` with `spec.planner.alwaysPipelineSync: true` and its stage option `prune:true` on v0.47.3-rc0

Open ffjlabo opened this issue 1 year ago • 0 comments

What happened:

When adding the k8s app with app.pipecd.yaml which has spec.planner.alwaysPipelineSync: true and K8S_PRIMARY_ROLLOUT with the option prune:true, piped fails with panic like this↓. The piped keeps failing until it is canceled on the UI.

found out 3 valid unregistered applications in repository "ffjlabo-dev"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x18 pc=0x103eda994]

goroutine 6677 [running]:
github.com/pipe-cd/pipecd/pkg/app/piped/executor/kubernetes.(*deployExecutor).ensurePrimaryRollout(0x140020c0e00, {0x1049e8f98, 0x14002003310})
	/Users/s14218/oss/pipe-cd/pipecd/pkg/app/piped/executor/kubernetes/primary.go:161 +0xc34
github.com/pipe-cd/pipecd/pkg/app/piped/executor/kubernetes.(*deployExecutor).Execute(0x140020c0e00, {0x1049e9580, 0x1400218c360})
	/Users/s14218/oss/pipe-cd/pipecd/pkg/app/piped/executor/kubernetes/kubernetes.go:147 +0xb2c
github.com/pipe-cd/pipecd/pkg/app/piped/controller.(*scheduler).executeStage(0x140001d7188, {0x1049e9580, 0x1400218c360}, {{{}, {}, {}, 0x0}, 0x0, {0x0, 0x0, ...}, ...}, ...)
	/Users/s14218/oss/pipe-cd/pipecd/pkg/app/piped/controller/scheduler.go:541 +0xde4
github.com/pipe-cd/pipecd/pkg/app/piped/controller.(*scheduler).Run.func2()
	/Users/s14218/oss/pipe-cd/pipecd/pkg/app/piped/controller/scheduler.go:300 +0xb8
created by github.com/pipe-cd/pipecd/pkg/app/piped/controller.(*scheduler).Run in goroutine 6674
	/Users/s14218/oss/pipe-cd/pipecd/pkg/app/piped/controller/scheduler.go:299 +0xc50
exit status 2

K8S_PRIMARY_ROLLOUT requires the previous running commit, which was deployed previously when prune option is enabled.

	// Wait for all applied manifests to be stable.
	// In theory, we don't need to wait for them to be stable before going to the next step
	// but waiting for a while reduces the number of Kubernetes changes in a short time.
	e.LogPersister.Info("Waiting for the applied manifests to be stable")
	select {
	case <-time.After(15 * time.Second):
		break
	case <-ctx.Done():
		break
	}

	// Find the running resources that are not defined in Git.
	e.LogPersister.Info("Start finding all running PRIMARY resources but no longer defined in Git")
	// Load running manifests at the most successful deployed commit.
	e.LogPersister.Infof("Loading running manifests at commit %s for handling", e.Deployment.RunningCommitHash)
	ds, err := e.RunningDSP.Get(ctx, e.LogPersister)

https://github.com/pipe-cd/pipecd/blob/812907842c79b6c26f48fc56cf533d9a47cc24a2/pkg/app/piped/executor/kubernetes/primary.go#L141-L165

But there isn't the one when adding the app first. So e.RunningDSP is nil and causes panic.

This bug is created by the fix in https://github.com/pipe-cd/pipecd/pull/4916

What you expected to happen:

It should fail with an error on the stage when first adding the app and deploying it as PipelineSync.

How to reproduce it:

Create and add k8s app below.

app.pipecd.yaml
deployment.yaml
service.yaml

app.pipecd.yaml

apiVersion: pipecd.dev/v1beta1
kind: KubernetesApp
spec:
  name: primary-rollout
  labels:
    env: example
    team: product
  planner:
    alwaysUsePipeline: true
  pipeline:
    stages:
      - name: WAIT
        with:
          duration: 5s
      - name: K8S_PRIMARY_ROLLOUT
        with:
          prune: true

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: primary-rollout
  labels:
    app: primary-rollout
spec:
  replicas: 3
  selector:
    matchLabels:
      app: primary-rollout
      pipecd.dev/variant: primary
  template:
    metadata:
      labels:
        app: primary-rollout
        pipecd.dev/variant: primary
    spec:
      containers:
      - name: helloworld
        image: ghcr.io/pipe-cd/helloworld:v0.30.0
        args:
          - server
        ports:
        - containerPort: 9085

service.yaml

apiVersion: v1
kind: Service
metadata:
  name: primary-rollout
spec:
  selector:
    app: primary-rollout
  ports:
    - protocol: TCP
      port: 9085
      targetPort: 9085

Environment:

piped version: v0.47.3-rc0
control-plane version:
Others:

Jun 25 '24 14:06 ffjlabo