cockroach-operator
cockroach-operator copied to clipboard
Operator crashes when creating vcheck pod if an invalid value of topologySpreadConstraints.whenUnsatisfiable is given
What version of operator are you using? commit 561cf47d783c368fd8795acb82a5a39099a35984 (HEAD -> master)
What operating system and processor architecture are you using (kubectl version)?
Ubuntu. 20.04
kubectl version Output
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.0", GitCommit:"4ce5a8954017644c5420bae81d72b09b735c21f0", GitTreeState:"clean", BuildDate:"2022-05-03T13:46:05Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"darwin/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.9", GitCommit:"6df4433e288edc9c40c2e344eb336f63fad45cd2", GitTreeState:"clean", BuildDate:"2022-05-19T19:53:08Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.24) and server (1.22) exceeds the supported minor version skew of +/-1
What did you do?
I find that crdb-operator will crash when starting the vcheck job pod if whenUnsatisfiable under topologySpreadConstraints has an invalid value.
I created the cluster by applying the following custom resource file, note that the value of whenUnsatisfiable is wrongly spelled:
apiVersion: crdb.cockroachlabs.com/v1alpha1
kind: CrdbCluster
metadata:
name: test-cluster
spec:
additionalLabels:
crdb: is-cool
dataStore:
pvc:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
volumeMode: Filesystem
image:
name: cockroachdb/cockroach:v21.2.10
nodes: 3
resources:
limits:
cpu: 2
memory: 2Gi
requests:
cpu: 100m
memory: 1Gi
tlsEnabled: true
topologySpreadConstraints:
- maxSkew: 3
topologyKey: MYKEY
whenUnsatisfiable: DoNotScedule
(All files were applied using kubectl apply -f <filename> -n cockroach-operator-system)
What did you see?
The program crashed.
Comments
I checked the log of the operator and I found that a nil pointer dereference happened.
Log details
E0114 14:15:47.939699
1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil point
goroutine 329 [running]:
k8s.io/apimachinery/pkg/util/runtime. logPanic({0×167d220,0×274b1b03)
external/io_k8s_apimachinery/pkg/util/runtime/runtime.go:74 +0×85
k8s.io/apimachinery/pkg/util/runtime. HandleCrash({0x0, 0x0, 0x00033f7a0})
external/io_k8s_apimachinery/pkg/util/runtime/runtime.go:48 +0×75
panic({0x167d220, 0x274b1b0})
GOROOT/src/runtime/panic.go: 1038 +0×215
github.com/cockroachdb/cockroach-operator/pkg/actor.IsJobPodRunning (0x1ab8d98, 0×c000672c01, {0×1b07190, 0xc0004674a0}, 0xc00049ad80, {0xlad3f80, 0xc000582720})
pkg/actor/validate_version.go:381 +0xbe
github.com/cockroachdb/cockroach-operator/pkg/actor.WaitUntiljobPodIsRunning. func1 ()
pkg/actor/validate_version.go:429 +0×35
github.com/cenkalti/backoff.RetryNotify (0xc00083a720, {0x1a9718, 0xc0000a8240}, 0x0)
external/com_github_cenkalti_backoff/retry.go:37 +0x1ac
github.com/cenkalti/backoff.Retryl...)
external/com_github_cenkalti_backoff/retry.go:24
github.com/cockroachdb/cockroach-operator/pkg/actor.WaitUntiljobPodIsRunning({0xlab8d98, 0xC00067f2c0}, {0x1b07190, 0xc0004674a0}, 0xc00049ad80, {0x1ad3f80, 0xc000582720})
pkg/actor/validate_version.go: 434 +0×105
github.com/cockroachdb/cockroach-operator/pkg/actor. (*versionChecker). Act (0xc000423cb0, foxlab8d98, 0×c00067f2c0}, oxc0000fb500, {0x1ad3f80, 0xc000582720})
pkg/actor/validate_version.go:151 +0×13cd
github.com/cockroachdb/cockroach-operator/pkg/controller. (*ClusterReconciler).Reconcile(0x00077a980, foxlab8d98, oxc0000fb2c07, {{{0xc0003b8140, 0×19}, {0×c0003c3634, Oxc}}})
It seems that when whenUnsatisfiable is assigned with an invalid value, the operator cannot find the vcheck job pod while it expects to find it, leading to a nil pointer dereference at this line where Selector is nil while Selector.MatchLabels is accessed.
@himanshu-cockroach can you please have a look?