cockroach-operator icon indicating copy to clipboard operation
cockroach-operator copied to clipboard

[BUG] Operator fails to provision a new cluster

Open munjalpatel opened this issue 2 years ago • 3 comments
trafficstars

I am trying to deploy the following cluster with operator v2.8.0

apiVersion: crdb.cockroachlabs.com/v1alpha1
kind: CrdbCluster
metadata:
  name: primary
  namespace: db
spec:
  dataStore:
    pvc:
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 50Gi
        volumeMode: Filesystem
    supportsAutoResize: true
  resources:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      memory: 1Gi
  tlsEnabled: true
  cockroachDBVersion: v22.1.2
  nodes: 3
  additionalLabels:
    db: pimary
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchExpressions:
                - key: app.kubernetes.io/instance
                  operator: In
                  values:
                    - cockroachdb
            topologyKey: kubernetes.io/hostname
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: ScheduleAnyway
      labelSelector:
        matchLabels:
          db: primary

However, the operator fails to provision a new cluster with the following errors:

{"level":"info","ts":1670742105.1208632,"logger":"webhooks","msg":"default","name":"primary"}
{"level":"info","ts":1670742105.124504,"logger":"controller.CrdbCluster","msg":"Running action with name: VersionCheckerAction","CrdbCluster":"db/primary","ReconcileId":"rw6qQAnW4WTDJzFFNg2Dof"}
{"level":"warn","ts":1670742105.1245308,"logger":"controller.CrdbCluster","msg":"starting to check the logging config provided","CrdbCluster":"db/primary","ReconcileId":"rw6qQAnW4WTDJzFFNg2Dof"}
{"level":"warn","ts":1670742105.1246645,"logger":"controller.CrdbCluster","msg":"Log configuration for the cockroach cluster: \"{sinks: {stderr: {channels: [OPS, HEALTH], redact: true}}}\"","CrdbCluster":"db/primary","ReconcileId":"rw6qQAnW4WTDJzFFNg2Dof"}
{"level":"error","ts":1670742105.2422922,"logger":"controller.CrdbCluster","msg":"The cockroachdb logging API is set to value that is not supported by the operator, See the default logging configuration here (https://www.cockroachlabs.com/docs/stable/configure-logs.html#default-logging-configuration) ","CrdbCluster":"db/primary","ReconcileId":"rw6qQAnW4WTDJzFFNg2Dof","error":"signal: killed","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\texternal/com_github_go_logr_zapr/zapr.go:132\ngithub.com/cockroachdb/cockroach-operator/pkg/actor.(*versionChecker).Act\n\tpkg/actor/validate_version.go:75\ngithub.com/cockroachdb/cockroach-operator/pkg/controller.(*ClusterReconciler).Reconcile\n\tpkg/controller/cluster_controller.go:153\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:297\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:252\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:215\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:99"}
{"level":"info","ts":1670742105.2424967,"logger":"controller.CrdbCluster","msg":"Error on action","CrdbCluster":"db/primary","ReconcileId":"rw6qQAnW4WTDJzFFNg2Dof","Action":"VersionCheckerAction","err":""}
{"level":"error","ts":1670742105.2425444,"logger":"controller.CrdbCluster","msg":"action failed","CrdbCluster":" db/primary","ReconcileId":"rw6qQAnW4WTDJzFFNg2Dof","error":"","errorVerbose":"\n(1) attached stack trace\n  -- stack trace:\n  | github.com/cockroachdb/cockroach-operator/pkg/actor.(*versionChecker).Act\n  | \tpkg/actor/validate_version.go:76\n  | github.com/cockroachdb/cockroach-operator/pkg/controller.(*ClusterReconciler).Reconcile\n  | \tpkg/controller/cluster_controller.go:153\n  | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n  | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:297\n  | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n  | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:252\n  | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n  | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:215\n  | k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n  | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\n  | k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n  | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:155\n  | k8s.io/apimachinery/pkg/util/wait.BackoffUntil\n  | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:156\n  | k8s.io/apimachinery/pkg/util/wait.JitterUntil\n  | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:133\n  | k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n  | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\n  | k8s.io/apimachinery/pkg/util/wait.UntilWithContext\n  | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:99\n  | runtime.goexit\n  | \tsrc/runtime/asm_amd64.s:1581\nWraps: (2)\nError types: (1) *withstack.withStack (2) *errutil.leafError","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\texternal/com_github_go_logr_zapr/zapr.go:132\ngithub.com/cockroachdb/cockroach-operator/pkg/controller.(*ClusterReconciler).Reconcile\n\tpkg/controller/cluster_controller.go:185\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:297\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:252\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:215\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:99"}
{"level":"error","ts":1670742105.252548,"logger":"controller-runtime.manager.controller.crdbcluster","msg":"Reconciler error","reconciler group":"crdb.cockroachlabs.com","reconciler kind":"CrdbCluster","name":"primary","namespace":"db","error":"","errorVerbose":"\n(1) attached stack trace\n  -- stack trace:\n  | github.com/cockroachdb/cockroach-operator/pkg/actor.(*versionChecker).Act\n  | \tpkg/actor/validate_version.go:76\n  | github.com/cockroachdb/cockroach-operator/pkg/controller.(*ClusterReconciler).Reconcile\n  | \tpkg/controller/cluster_controller.go:153\n  | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n  | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:297\n  | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n  | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:252\n  | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n  | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:215\n  | k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n  | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\n  | k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n  | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:155\n  | k8s.io/apimachinery/pkg/util/wait.BackoffUntil\n  | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:156\n  | k8s.io/apimachinery/pkg/util/wait.JitterUntil\n  | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:133\n  | k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n  | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\n  | k8s.io/apimachinery/pkg/util/wait.UntilWithContext\n  | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:99\n  | runtime.goexit\n  | \tsrc/runtime/asm_amd64.s:1581\nWraps: (2)\nError types: (1) *withstack.withStack (2) *errutil.leafError","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\texternal/com_github_go_logr_zapr/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:301\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:252\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:215\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:99"}

munjalpatel avatar Dec 11 '22 07:12 munjalpatel

Upon further investigation, apparently, any cockroach on the operator pod immediately gets killed

image

munjalpatel avatar Dec 11 '22 16:12 munjalpatel

Interestingly, changing to image from cockroachDBVersion fixes it.

image:
    name: cockroachdb/cockroach:arm64-v22.2.0

munjalpatel avatar Dec 11 '22 18:12 munjalpatel

I had the same issue; however, increasing the memory limit (32Mi is the value from the manifest) of the operator helped.

Edit: Oops, my bad, I forgot that I put the limit myself at the same value as the request one. The install manifest doesn't include limits.

Erouan50 avatar Oct 25 '23 11:10 Erouan50