cockroach-operator
cockroach-operator copied to clipboard
[BUG] Operator fails to provision a new cluster
trafficstars
I am trying to deploy the following cluster with operator v2.8.0
apiVersion: crdb.cockroachlabs.com/v1alpha1
kind: CrdbCluster
metadata:
name: primary
namespace: db
spec:
dataStore:
pvc:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
volumeMode: Filesystem
supportsAutoResize: true
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
memory: 1Gi
tlsEnabled: true
cockroachDBVersion: v22.1.2
nodes: 3
additionalLabels:
db: pimary
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/instance
operator: In
values:
- cockroachdb
topologyKey: kubernetes.io/hostname
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
db: primary
However, the operator fails to provision a new cluster with the following errors:
{"level":"info","ts":1670742105.1208632,"logger":"webhooks","msg":"default","name":"primary"}
{"level":"info","ts":1670742105.124504,"logger":"controller.CrdbCluster","msg":"Running action with name: VersionCheckerAction","CrdbCluster":"db/primary","ReconcileId":"rw6qQAnW4WTDJzFFNg2Dof"}
{"level":"warn","ts":1670742105.1245308,"logger":"controller.CrdbCluster","msg":"starting to check the logging config provided","CrdbCluster":"db/primary","ReconcileId":"rw6qQAnW4WTDJzFFNg2Dof"}
{"level":"warn","ts":1670742105.1246645,"logger":"controller.CrdbCluster","msg":"Log configuration for the cockroach cluster: \"{sinks: {stderr: {channels: [OPS, HEALTH], redact: true}}}\"","CrdbCluster":"db/primary","ReconcileId":"rw6qQAnW4WTDJzFFNg2Dof"}
{"level":"error","ts":1670742105.2422922,"logger":"controller.CrdbCluster","msg":"The cockroachdb logging API is set to value that is not supported by the operator, See the default logging configuration here (https://www.cockroachlabs.com/docs/stable/configure-logs.html#default-logging-configuration) ","CrdbCluster":"db/primary","ReconcileId":"rw6qQAnW4WTDJzFFNg2Dof","error":"signal: killed","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\texternal/com_github_go_logr_zapr/zapr.go:132\ngithub.com/cockroachdb/cockroach-operator/pkg/actor.(*versionChecker).Act\n\tpkg/actor/validate_version.go:75\ngithub.com/cockroachdb/cockroach-operator/pkg/controller.(*ClusterReconciler).Reconcile\n\tpkg/controller/cluster_controller.go:153\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:297\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:252\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:215\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:99"}
{"level":"info","ts":1670742105.2424967,"logger":"controller.CrdbCluster","msg":"Error on action","CrdbCluster":"db/primary","ReconcileId":"rw6qQAnW4WTDJzFFNg2Dof","Action":"VersionCheckerAction","err":""}
{"level":"error","ts":1670742105.2425444,"logger":"controller.CrdbCluster","msg":"action failed","CrdbCluster":" db/primary","ReconcileId":"rw6qQAnW4WTDJzFFNg2Dof","error":"","errorVerbose":"\n(1) attached stack trace\n -- stack trace:\n | github.com/cockroachdb/cockroach-operator/pkg/actor.(*versionChecker).Act\n | \tpkg/actor/validate_version.go:76\n | github.com/cockroachdb/cockroach-operator/pkg/controller.(*ClusterReconciler).Reconcile\n | \tpkg/controller/cluster_controller.go:153\n | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:297\n | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:252\n | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:215\n | k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\n | k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:155\n | k8s.io/apimachinery/pkg/util/wait.BackoffUntil\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:156\n | k8s.io/apimachinery/pkg/util/wait.JitterUntil\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:133\n | k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\n | k8s.io/apimachinery/pkg/util/wait.UntilWithContext\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:99\n | runtime.goexit\n | \tsrc/runtime/asm_amd64.s:1581\nWraps: (2)\nError types: (1) *withstack.withStack (2) *errutil.leafError","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\texternal/com_github_go_logr_zapr/zapr.go:132\ngithub.com/cockroachdb/cockroach-operator/pkg/controller.(*ClusterReconciler).Reconcile\n\tpkg/controller/cluster_controller.go:185\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:297\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:252\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:215\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:99"}
{"level":"error","ts":1670742105.252548,"logger":"controller-runtime.manager.controller.crdbcluster","msg":"Reconciler error","reconciler group":"crdb.cockroachlabs.com","reconciler kind":"CrdbCluster","name":"primary","namespace":"db","error":"","errorVerbose":"\n(1) attached stack trace\n -- stack trace:\n | github.com/cockroachdb/cockroach-operator/pkg/actor.(*versionChecker).Act\n | \tpkg/actor/validate_version.go:76\n | github.com/cockroachdb/cockroach-operator/pkg/controller.(*ClusterReconciler).Reconcile\n | \tpkg/controller/cluster_controller.go:153\n | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:297\n | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:252\n | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:215\n | k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\n | k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:155\n | k8s.io/apimachinery/pkg/util/wait.BackoffUntil\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:156\n | k8s.io/apimachinery/pkg/util/wait.JitterUntil\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:133\n | k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\n | k8s.io/apimachinery/pkg/util/wait.UntilWithContext\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:99\n | runtime.goexit\n | \tsrc/runtime/asm_amd64.s:1581\nWraps: (2)\nError types: (1) *withstack.withStack (2) *errutil.leafError","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\texternal/com_github_go_logr_zapr/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:301\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:252\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:215\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:99"}
Upon further investigation, apparently, any cockroach on the operator pod immediately gets killed
Interestingly, changing to image from cockroachDBVersion fixes it.
image:
name: cockroachdb/cockroach:arm64-v22.2.0
I had the same issue; however, increasing the memory limit (32Mi is the value from the manifest) of the operator helped.
Edit: Oops, my bad, I forgot that I put the limit myself at the same value as the request one. The install manifest doesn't include limits.