cockroach-operator
cockroach-operator copied to clipboard
error when upgrading crdb version
Versions: operator: 2.5 crdb: 21.1.11
I changed the crdb resource to 21.1.13 and the operator is unable to perform the upgrade. I'm running pretty much the default examples provided.
{"level":"info","ts":1644447812.3141432,"logger":"controller.CrdbCluster","msg":"reconciling CockroachDB cluster","CrdbCluster":"default/cockroachdb","ReconcileId":"QxbDSJ6eSddyvYGRWepNKa"}
{"level":"info","ts":1644447812.3142016,"logger":"webhooks","msg":"default","name":"cockroachdb"}
{"level":"info","ts":1644447812.3188086,"logger":"controller.CrdbCluster","msg":"Running action with name: PartitionedUpdate","CrdbCluster":"default/cockroachdb","ReconcileId":"QxbDSJ6eSddyvYGRWepNKa"}
{"level":"warn","ts":1644447812.3188314,"logger":"controller.CrdbCluster","msg":"checking update opportunities, using a partitioned update","CrdbCluster":"default/cockroachdb","ReconcileId":"QxbDSJ6eSddyvYGRWepNKa"}
{"level":"warn","ts":1644447812.3188922,"logger":"controller.CrdbCluster","msg":"operator is running inside of kubernetes, connecting to service for db connection","CrdbCluster":"default/cockroachdb","ReconcileId":"QxbDSJ6eSddyvYGRWepNKa"}
{"level":"info","ts":1644447812.3661704,"logger":"controller.CrdbCluster","msg":"Error on action","CrdbCluster":"default/cockroachdb","ReconcileId":"QxbDSJ6eSddyvYGRWepNKa","Action":"PartitionedUpdate","err":"failed to create database connection: opening a DB connection failed testing db connection failed: lookup cockroachdb-public on 10.192.0.10:53: no such host"}
{"level":"error","ts":1644447812.366221,"logger":"controller.CrdbCluster","msg":"action failed","CrdbCluster":"default/cockroachdb","ReconcileId":"QxbDSJ6eSddyvYGRWepNKa","error":"failed to create database connection: opening a DB connection failed testing db connection failed: lookup cockroachdb-public on 10.192.0.10:53: no such host","errorVerbose":"failed to create database connection: opening a DB connection failed testing db connection failed: lookup cockroachdb-public on 10.192.0.10:53: no such host\n(1) attached stack trace\n -- stack trace:\n | github.com/cockroachdb/cockroach-operator/pkg/actor.(*partitionedUpdate).Act\n | \tpkg/actor/partitioned_update.go:172\n | github.com/cockroachdb/cockroach-operator/pkg/controller.(*ClusterReconciler).Reconcile\n | \tpkg/controller/cluster_controller.go:152\n | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:297\n | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:252\n | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:215\n | k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\n | k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:155\n | k8s.io/apimachinery/pkg/util/wait.BackoffUntil\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:156\n | k8s.io/apimachinery/pkg/util/wait.JitterUntil\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:133\n | k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\n | k8s.io/apimachinery/pkg/util/wait.UntilWithContext\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:99\n | runtime.goexit\n | \tsrc/runtime/asm_amd64.s:1581\nWraps: (2) failed to create database connection\nWraps: (3) opening a DB connection failed testing db connection failed: lookup cockroachdb-public on 10.192.0.10:53: no such host\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\texternal/com_github_go_logr_zapr/zapr.go:132\ngithub.com/cockroachdb/cockroach-operator/pkg/controller.(*ClusterReconciler).Reconcile\n\tpkg/controller/cluster_controller.go:179\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:297\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:252\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:215\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:99"}
kubectl get services -n default cockroachdb-public
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
cockroachdb-public ClusterIP 10.192.0.194 <none> 26258/TCP,8080/TCP,26257/TCP 86d
Resources
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.5.0/install/crds.yaml
- https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.5.0/install/operator.yaml
patches:
- patch: |-
- op: add
path: /spec/template/spec/containers/0/args/-
value: -feature-gates
- op: add
path: /spec/template/spec/containers/0/args/-
value: TolerationRules=true,AffinityRules=true
target:
kind: Deployment
---
apiVersion: crdb.cockroachlabs.com/v1alpha1
kind: CrdbCluster
metadata:
name: cockroachdb
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/instance
operator: In
values:
- cockroachdb
topologyKey: kubernetes.io/hostname
weight: 100
dataStore:
pvc:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
volumeMode: Filesystem
image:
name: cockroachdb/cockroach:v21.1.13
nodeSelector:
worker-pool-name: crdb-node-pool
nodes: 3
resources:
limits:
cpu: "3"
memory: 12Gi
requests:
cpu: "2"
memory: 8Gi
tlsEnabled: true
tolerations:
- effect: NoSchedule
key: reservation
operator: Equal
value: cockroachdb
Getting the same issue as well. Seems that the operator tries to use the public service name when running in kubernetes, but can't reach it because it is in a different namespace.
You can work around this by either using the same namespace for both the operator and the cluster, or by using a service with an external name to point back to the cluster's namespace.
example:
kind: Service
apiVersion: v1
metadata:
name: cockroachdb-public
namespace: cockroach-operator-system
spec:
type: ExternalName
externalName: cockroachdb-public.<cockroach cluster namespace>.svc.cluster.local
ports:
- port: 26257
This may have been fixed with https://github.com/cockroachdb/cockroach-operator/pull/943 and release 2.9.0 I presume?