cockroach-operator icon indicating copy to clipboard operation
cockroach-operator copied to clipboard

error when upgrading crdb version

Open scirner22 opened this issue 3 years ago • 1 comments

Versions: operator: 2.5 crdb: 21.1.11

I changed the crdb resource to 21.1.13 and the operator is unable to perform the upgrade. I'm running pretty much the default examples provided.

{"level":"info","ts":1644447812.3141432,"logger":"controller.CrdbCluster","msg":"reconciling CockroachDB cluster","CrdbCluster":"default/cockroachdb","ReconcileId":"QxbDSJ6eSddyvYGRWepNKa"}
{"level":"info","ts":1644447812.3142016,"logger":"webhooks","msg":"default","name":"cockroachdb"}
{"level":"info","ts":1644447812.3188086,"logger":"controller.CrdbCluster","msg":"Running action with name: PartitionedUpdate","CrdbCluster":"default/cockroachdb","ReconcileId":"QxbDSJ6eSddyvYGRWepNKa"}
{"level":"warn","ts":1644447812.3188314,"logger":"controller.CrdbCluster","msg":"checking update opportunities, using a partitioned update","CrdbCluster":"default/cockroachdb","ReconcileId":"QxbDSJ6eSddyvYGRWepNKa"}
{"level":"warn","ts":1644447812.3188922,"logger":"controller.CrdbCluster","msg":"operator is running inside of kubernetes, connecting to service for db connection","CrdbCluster":"default/cockroachdb","ReconcileId":"QxbDSJ6eSddyvYGRWepNKa"}
{"level":"info","ts":1644447812.3661704,"logger":"controller.CrdbCluster","msg":"Error on action","CrdbCluster":"default/cockroachdb","ReconcileId":"QxbDSJ6eSddyvYGRWepNKa","Action":"PartitionedUpdate","err":"failed to create database connection: opening a DB connection failed testing db connection failed: lookup cockroachdb-public on 10.192.0.10:53: no such host"}
{"level":"error","ts":1644447812.366221,"logger":"controller.CrdbCluster","msg":"action failed","CrdbCluster":"default/cockroachdb","ReconcileId":"QxbDSJ6eSddyvYGRWepNKa","error":"failed to create database connection: opening a DB connection failed testing db connection failed: lookup cockroachdb-public on 10.192.0.10:53: no such host","errorVerbose":"failed to create database connection: opening a DB connection failed testing db connection failed: lookup cockroachdb-public on 10.192.0.10:53: no such host\n(1) attached stack trace\n  -- stack trace:\n  | github.com/cockroachdb/cockroach-operator/pkg/actor.(*partitionedUpdate).Act\n  | \tpkg/actor/partitioned_update.go:172\n  | github.com/cockroachdb/cockroach-operator/pkg/controller.(*ClusterReconciler).Reconcile\n  | \tpkg/controller/cluster_controller.go:152\n  | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n  | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:297\n  | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n  | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:252\n  | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n  | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:215\n  | k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n  | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\n  | k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n  | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:155\n  | k8s.io/apimachinery/pkg/util/wait.BackoffUntil\n  | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:156\n  | k8s.io/apimachinery/pkg/util/wait.JitterUntil\n  | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:133\n  | k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n  | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\n  | k8s.io/apimachinery/pkg/util/wait.UntilWithContext\n  | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:99\n  | runtime.goexit\n  | \tsrc/runtime/asm_amd64.s:1581\nWraps: (2) failed to create database connection\nWraps: (3) opening a DB connection failed testing db connection failed: lookup cockroachdb-public on 10.192.0.10:53: no such host\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\texternal/com_github_go_logr_zapr/zapr.go:132\ngithub.com/cockroachdb/cockroach-operator/pkg/controller.(*ClusterReconciler).Reconcile\n\tpkg/controller/cluster_controller.go:179\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:297\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:252\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:215\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:99"}
kubectl get services -n default cockroachdb-public
NAME                 TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                        AGE
cockroachdb-public   ClusterIP   10.192.0.194   <none>        26258/TCP,8080/TCP,26257/TCP   86d

Resources

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.5.0/install/crds.yaml
  - https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.5.0/install/operator.yaml

patches:
- patch: |-
    - op: add
      path: /spec/template/spec/containers/0/args/-
      value: -feature-gates
    - op: add
      path: /spec/template/spec/containers/0/args/-
      value: TolerationRules=true,AffinityRules=true
  target:
    kind: Deployment
---
apiVersion: crdb.cockroachlabs.com/v1alpha1
kind: CrdbCluster
metadata:
  name: cockroachdb
spec:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app.kubernetes.io/instance
              operator: In
              values:
              - cockroachdb
          topologyKey: kubernetes.io/hostname
        weight: 100
  dataStore:
    pvc:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 20Gi
        volumeMode: Filesystem
  image:
    name: cockroachdb/cockroach:v21.1.13
  nodeSelector:
    worker-pool-name: crdb-node-pool
  nodes: 3
  resources:
    limits:
      cpu: "3"
      memory: 12Gi
    requests:
      cpu: "2"
      memory: 8Gi
  tlsEnabled: true
  tolerations:
  - effect: NoSchedule
    key: reservation
    operator: Equal
    value: cockroachdb

scirner22 avatar Feb 09 '22 23:02 scirner22

Getting the same issue as well. Seems that the operator tries to use the public service name when running in kubernetes, but can't reach it because it is in a different namespace.

You can work around this by either using the same namespace for both the operator and the cluster, or by using a service with an external name to point back to the cluster's namespace.

example:

kind: Service
apiVersion: v1
metadata:
  name: cockroachdb-public
  namespace: cockroach-operator-system
spec:
  type: ExternalName
  externalName: cockroachdb-public.<cockroach cluster namespace>.svc.cluster.local
  ports:
  - port: 26257

jfrconley avatar Feb 24 '22 22:02 jfrconley

This may have been fixed with https://github.com/cockroachdb/cockroach-operator/pull/943 and release 2.9.0 I presume?

SomeDatabaseDude avatar Feb 06 '23 15:02 SomeDatabaseDude