cockroach-operator
cockroach-operator copied to clipboard
Cannot decommission node
While Decommission with Operator, Operator gives error like that
{"level":"warn","ts":1665138574.1910405,"logger":"controller.CrdbCluster","msg":"scaling down stateful set","CrdbCluster":"cockroach-cluster-stage/cockroachdb","ReconcileId":"kF7Vns39vPGnqXncUhmWnX","have":5,"want":4}
{"level":"error","ts":1665138574.8271742,"logger":"controller.CrdbCluster","msg":"decommission failed","CrdbCluster":"cockroach-cluster-stage/cockroachdb","ReconcileId":"kF7Vns39vPGnqXncUhmWnX","error":"failed to stream execution results back: command terminated with exit code 1","errorVerbose":"failed to stream execution results back: command terminated with exit code 1\n(1) attached stack trace\n -- stack trace:\n | github.com/cockroachdb/cockroach-operator/pkg/scale.CockroachExecutor.Exec\n | \tpkg/scale/executor.go:57\n | github.com/cockroachdb/cockroach-operator/pkg/scale.(*CockroachNodeDrainer).findNodeID\n | \tpkg/scale/drainer.go:242\n | github.com/cockroachdb/cockroach-operator/pkg/scale.(*CockroachNodeDrainer).Decommission\n | \tpkg/scale/drainer.go:79\n | github.com/cockroachdb/cockroach-operator/pkg/scale.(*Scaler).EnsureScale\n | \tpkg/scale/scale.go:91\n | github.com/cockroachdb/cockroach-operator/pkg/actor.decommission.Act\n | \tpkg/actor/decommission.go:143\n | github.com/cockroachdb/cockroach-operator/pkg/controller.(*ClusterReconciler).Reconcile\n | \tpkg/controller/cluster_controller.go:153\n | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:297\n | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:252\n | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:215\n | k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\n | k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:155\n | k8s.io/apimachinery/pkg/util/wait.BackoffUntil\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:156\n | k8s.io/apimachinery/pkg/util/wait.JitterUntil\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:133\n | k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\n | k8s.io/apimachinery/pkg/util/wait.UntilWithContext\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:99\n | runtime.goexit\n | \tsrc/runtime/asm_amd64.s:1581\nWraps: (2) failed to stream execution results back\nWraps: (3) command terminated with exit code 1\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) exec.CodeExitError","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\texternal/com_github_go_logr_zapr/zapr.go:132\ngithub.com/cockroachdb/cockroach-operator/pkg/actor.decommission.Act\n\tpkg/actor/decommission.go:145\ngithub.com/cockroachdb/cockroach-operator/pkg/controller.(*ClusterReconciler).Reconcile\n\tpkg/controller/cluster_controller.go:153\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:297\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:252\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:215\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:99"}
{"level":"info","ts":1665138574.8283174,"logger":"controller.CrdbCluster","msg":"Error on action","CrdbCluster":"cockroach-cluster-stage/cockroachdb","ReconcileId":"kF7Vns39vPGnqXncUhmWnX","Action":"Decommission","err":"failed to stream execution results back: command terminated with exit code 1"}
{"level":"error","ts":1665138574.8283627,"logger":"controller.CrdbCluster","msg":"action failed","CrdbCluster":"cockroach-cluster-stage/cockroachdb","ReconcileId":"kF7Vns39vPGnqXncUhmWnX","error":"failed to stream execution results back: command terminated with exit code 1","errorVerbose":"failed to stream execution results back: command terminated with exit code 1\n(1) attached stack trace\n -- stack trace:\n | github.com/cockroachdb/cockroach-operator/pkg/scale.CockroachExecutor.Exec\n | \tpkg/scale/executor.go:57\n | github.com/cockroachdb/cockroach-operator/pkg/scale.(*CockroachNodeDrainer).findNodeID\n | \tpkg/scale/drainer.go:242\n | github.com/cockroachdb/cockroach-operator/pkg/scale.(*CockroachNodeDrainer).Decommission\n | \tpkg/scale/drainer.go:79\n | github.com/cockroachdb/cockroach-operator/pkg/scale.(*Scaler).EnsureScale\n | \tpkg/scale/scale.go:91\n | github.com/cockroachdb/cockroach-operator/pkg/actor.decommission.Act\n | \tpkg/actor/decommission.go:143\n | github.com/cockroachdb/cockroach-operator/pkg/controller.(*ClusterReconciler).Reconcile\n | \tpkg/controller/cluster_controller.go:153\n | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:297\n | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:252\n | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:215\n | k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\n | k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:155\n | k8s.io/apimachinery/pkg/util/wait.BackoffUntil\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:156\n | k8s.io/apimachinery/pkg/util/wait.JitterUntil\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:133\n | k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\n | k8s.io/apimachinery/pkg/util/wait.UntilWithContext\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:99\n | runtime.goexit\n | \tsrc/runtime/asm_amd64.s:1581\nWraps: (2) failed to stream execution results back\nWraps: (3) command terminated with exit code 1\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) exec.CodeExitError","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\texternal/com_github_go_logr_zapr/zapr.go:132\ngithub.com/cockroachdb/cockroach-operator/pkg/controller.(*ClusterReconciler).Reconcile\n\tpkg/controller/cluster_controller.go:185\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:297\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:252\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:215\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:99"}
{"level":"error","ts":1665138574.836441,"logger":"controller-runtime.manager.controller.crdbcluster","msg":"Reconciler error","reconciler group":"crdb.cockroachlabs.com","reconciler kind":"CrdbCluster","name":"cockroachdb","namespace":"cockroach-cluster-stage","error":"failed to stream execution results back: command terminated with exit code 1","errorVerbose":"failed to stream execution results back: command terminated with exit code 1\n(1) attached stack trace\n -- stack trace:\n | github.com/cockroachdb/cockroach-operator/pkg/scale.CockroachExecutor.Exec\n | \tpkg/scale/executor.go:57\n | github.com/cockroachdb/cockroach-operator/pkg/scale.(*CockroachNodeDrainer).findNodeID\n | \tpkg/scale/drainer.go:242\n | github.com/cockroachdb/cockroach-operator/pkg/scale.(*CockroachNodeDrainer).Decommission\n | \tpkg/scale/drainer.go:79\n | github.com/cockroachdb/cockroach-operator/pkg/scale.(*Scaler).EnsureScale\n | \tpkg/scale/scale.go:91\n | github.com/cockroachdb/cockroach-operator/pkg/actor.decommission.Act\n | \tpkg/actor/decommission.go:143\n | github.com/cockroachdb/cockroach-operator/pkg/controller.(*ClusterReconciler).Reconcile\n | \tpkg/controller/cluster_controller.go:153\n | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:297\n | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:252\n | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n | \texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:215\n | k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\n | k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:155\n | k8s.io/apimachinery/pkg/util/wait.BackoffUntil\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:156\n | k8s.io/apimachinery/pkg/util/wait.JitterUntil\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:133\n | k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\n | k8s.io/apimachinery/pkg/util/wait.UntilWithContext\n | \texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:99\n | runtime.goexit\n | \tsrc/runtime/asm_amd64.s:1581\nWraps: (2) failed to stream execution results back\nWraps: (3) command terminated with exit code 1\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) exec.CodeExitError","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\texternal/com_github_go_logr_zapr/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:301\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:252\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n\texternal/io_k8s_sigs_controller_runtime/pkg/internal/controller/controller.go:215\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\texternal/io_k8s_apimachinery/pkg/util/wait/wait.go:99"}
{"level":"info","ts":1665138584.3979504,"logger":"controller.CrdbCluster","msg":"reconciling CockroachDB cluster","CrdbCluster":"cockroach-cluster-stage/cockroachdb","ReconcileId":"URAajRJhYWotEB4tQs6hRm"}
{"level":"info","ts":1665138584.3980412,"logger":"webhooks","msg":"default","name":"cockroachdb"}
{"level":"info","ts":1665138584.4027824,"logger":"controller.CrdbCluster","msg":"Running action with name: Decommission","CrdbCluster":"cockroach-cluster-stage/cockroachdb","ReconcileId":"URAajRJhYWotEB4tQs6hRm"}
{"level":"warn","ts":1665138584.4028075,"logger":"controller.CrdbCluster","msg":"check decommission opportunities","CrdbCluster":"cockroach-cluster-stage/cockroachdb","ReconcileId":"URAajRJhYWotEB4tQs6hRm"}
{"level":"info","ts":1665138584.4028518,"logger":"controller.CrdbCluster","msg":"replicas decommissioning","CrdbCluster":"cockroach-cluster-stage/cockroachdb","ReconcileId":"URAajRJhYWotEB4tQs6hRm","status.CurrentReplicas":5,"expected":4}
{"level":"warn","ts":1665138584.4028952,"logger":"controller.CrdbCluster","msg":"operator is running inside of kubernetes, connecting to service for db connection","CrdbCluster":"cockroach-cluster-stage/cockroachdb","ReconcileId":"URAajRJhYWotEB4tQs6hRm"}
When I try to decommission with cockroach client, also it gives us error like that
$ kubectl -n cockroach-cluster-stage exec -it cockroachdb-client-secure -- ./cockroach node decommission 4 --certs-dir=/cockroach/cockroach-certs --host=cockroachdb-public.cockroach-cluster-stage
ERROR: operation timed out.
failed to connect to the node: initial connection heartbeat failed: operation "rpc heartbeat" timed out after 6.001s (given timeout 6s): rpc error: code = DeadlineExceeded desc = context deadline exceeded
Failed running "node decommission"
command terminated with exit code 1
But in real, I can connect all nodes inside Cockroach cluster inside K8S and all is live state.
$kubectl -n cockroach-cluster-stage exec -it cockroachdb-client-secure -- ./cockroach node status --decommission --certs-dir=/cockroach/cockroach-certs --host=cockroachdb-public.cockroach-cluster-stage
id | address | sql_address | build | started_at | updated_at | locality | is_available | is_live | gossiped_replicas | is_decommissioning | membership | is_draining
-----+---------------------------------------------------------+---------------------------------------------------------+---------+----------------------------+----------------------------+----------+--------------+---------+-------------------+--------------------+------------+--------------
1 | cockroachdb-0.cockroachdb.cockroach-cluster-stage:26258 | cockroachdb-0.cockroachdb.cockroach-cluster-stage:26257 | v22.1.2 | 2022-10-07 11:21:29.236929 | 2022-10-07 11:47:48.769312 | | true | true | 33 | false | active | false
2 | cockroachdb-1.cockroachdb.cockroach-cluster-stage:26258 | cockroachdb-1.cockroachdb.cockroach-cluster-stage:26257 | v22.1.2 | 2022-10-07 11:21:29.497315 | 2022-10-07 11:47:49.009594 | | true | true | 34 | false | active | false
3 | cockroachdb-2.cockroachdb.cockroach-cluster-stage:26258 | cockroachdb-2.cockroachdb.cockroach-cluster-stage:26257 | v22.1.2 | 2022-10-07 11:21:29.618476 | 2022-10-07 11:47:49.135673 | | true | true | 33 | false | active | false
4 | cockroachdb-3.cockroachdb.cockroach-cluster-stage:26258 | cockroachdb-3.cockroachdb.cockroach-cluster-stage:26257 | v22.1.2 | 2022-10-07 11:31:22.817727 | 2022-10-07 11:47:48.334172 | | true | true | 32 | false | active | false