cockroach-operator icon indicating copy to clipboard operation
cockroach-operator copied to clipboard

Operator crashes when creating vcheck pod if an invalid value of topologySpreadConstraints.whenUnsatisfiable is given

Open hoyhbx opened this issue 2 years ago • 1 comments

What version of operator are you using? commit 561cf47d783c368fd8795acb82a5a39099a35984 (HEAD -> master)

What operating system and processor architecture are you using (kubectl version)? Ubuntu. 20.04

kubectl version Output
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.0", GitCommit:"4ce5a8954017644c5420bae81d72b09b735c21f0", GitTreeState:"clean", BuildDate:"2022-05-03T13:46:05Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"darwin/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.9", GitCommit:"6df4433e288edc9c40c2e344eb336f63fad45cd2", GitTreeState:"clean", BuildDate:"2022-05-19T19:53:08Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.24) and server (1.22) exceeds the supported minor version skew of +/-1

What did you do?

I find that crdb-operator will crash when starting the vcheck job pod if whenUnsatisfiable under topologySpreadConstraints has an invalid value.

I created the cluster by applying the following custom resource file, note that the value of whenUnsatisfiable is wrongly spelled:

apiVersion: crdb.cockroachlabs.com/v1alpha1
kind: CrdbCluster
metadata:
  name: test-cluster
spec:
  additionalLabels:
    crdb: is-cool
  dataStore:
    pvc:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
        volumeMode: Filesystem
  image:
    name: cockroachdb/cockroach:v21.2.10
  nodes: 3
  resources:
    limits:
      cpu: 2
      memory: 2Gi
    requests:
      cpu: 100m
      memory: 1Gi
  tlsEnabled: true
  topologySpreadConstraints:
  - maxSkew: 3
    topologyKey: MYKEY
    whenUnsatisfiable: DoNotScedule

(All files were applied using kubectl apply -f <filename> -n cockroach-operator-system)

What did you see?

The program crashed.

Comments

I checked the log of the operator and I found that a nil pointer dereference happened.

Log details
E0114 14:15:47.939699
1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil point
goroutine 329 [running]:
k8s.io/apimachinery/pkg/util/runtime. logPanic({0×167d220,0×274b1b03)
external/io_k8s_apimachinery/pkg/util/runtime/runtime.go:74 +0×85
k8s.io/apimachinery/pkg/util/runtime. HandleCrash({0x0, 0x0, 0x00033f7a0})
external/io_k8s_apimachinery/pkg/util/runtime/runtime.go:48 +0×75
panic({0x167d220, 0x274b1b0})
GOROOT/src/runtime/panic.go: 1038 +0×215
github.com/cockroachdb/cockroach-operator/pkg/actor.IsJobPodRunning (0x1ab8d98, 0×c000672c01, {0×1b07190, 0xc0004674a0}, 0xc00049ad80, {0xlad3f80, 0xc000582720})
pkg/actor/validate_version.go:381 +0xbe
github.com/cockroachdb/cockroach-operator/pkg/actor.WaitUntiljobPodIsRunning. func1 ()
pkg/actor/validate_version.go:429 +0×35
github.com/cenkalti/backoff.RetryNotify (0xc00083a720, {0x1a9718, 0xc0000a8240}, 0x0)
external/com_github_cenkalti_backoff/retry.go:37 +0x1ac
github.com/cenkalti/backoff.Retryl...)
external/com_github_cenkalti_backoff/retry.go:24
github.com/cockroachdb/cockroach-operator/pkg/actor.WaitUntiljobPodIsRunning({0xlab8d98, 0xC00067f2c0}, {0x1b07190, 0xc0004674a0}, 0xc00049ad80, {0x1ad3f80, 0xc000582720})
pkg/actor/validate_version.go: 434 +0×105
github.com/cockroachdb/cockroach-operator/pkg/actor. (*versionChecker). Act (0xc000423cb0, foxlab8d98, 0×c00067f2c0}, oxc0000fb500, {0x1ad3f80, 0xc000582720})
pkg/actor/validate_version.go:151 +0×13cd
github.com/cockroachdb/cockroach-operator/pkg/controller. (*ClusterReconciler).Reconcile(0x00077a980, foxlab8d98, oxc0000fb2c07, {{{0xc0003b8140, 0×19}, {0×c0003c3634, Oxc}}})

It seems that when whenUnsatisfiable is assigned with an invalid value, the operator cannot find the vcheck job pod while it expects to find it, leading to a nil pointer dereference at this line where Selector is nil while Selector.MatchLabels is accessed.

hoyhbx avatar Jan 22 '23 21:01 hoyhbx

@himanshu-cockroach can you please have a look?

prafull01 avatar Apr 20 '23 14:04 prafull01