cass-operator icon indicating copy to clipboard operation
cass-operator copied to clipboard

Unable to use cassandra 4.1.X version

Open JokerDevops opened this issue 1 year ago • 7 comments

What happened?

This is the log that appears in Cassandra:

2024-10-21T09:49:14.522Z	INFO	Stopping and waiting for leader election runnables
2024-10-21T09:49:14.523Z	INFO	Starting EventSource	{"controller": "cassandradatacenter_controller", "controllerGroup": "cassandra.datastax.com", "controllerKind": "CassandraDatacenter", "source": "kind source: *v1beta1.CassandraDatacenter"}
2024-10-21T09:49:14.524Z	INFO	Starting EventSource	{"controller": "cassandradatacenter_controller", "controllerGroup": "cassandra.datastax.com", "controllerKind": "CassandraDatacenter", "source": "kind source: *v1.StatefulSet"}
2024-10-21T09:49:14.524Z	INFO	Starting EventSource	{"controller": "cassandratask", "controllerGroup": "control.k8ssandra.io", "controllerKind": "CassandraTask", "source": "kind source: *v1alpha1.CassandraTask"}
2024-10-21T09:49:14.524Z	INFO	Starting Controller	{"controller": "cassandratask", "controllerGroup": "control.k8ssandra.io", "controllerKind": "CassandraTask"}
2024-10-21T09:49:14.524Z	INFO	Starting workers	{"controller": "cassandratask", "controllerGroup": "control.k8ssandra.io", "controllerKind": "CassandraTask", "worker count": 1}
2024-10-21T09:49:14.524Z	INFO	Shutdown signal received, waiting for all workers to finish	{"controller": "cassandratask", "controllerGroup": "control.k8ssandra.io", "controllerKind": "CassandraTask"}
2024-10-21T09:49:14.524Z	INFO	All workers finished	{"controller": "cassandratask", "controllerGroup": "control.k8ssandra.io", "controllerKind": "CassandraTask"}
2024-10-21T09:49:14.524Z	INFO	Starting EventSource	{"controller": "cassandradatacenter_controller", "controllerGroup": "cassandra.datastax.com", "controllerKind": "CassandraDatacenter", "source": "kind source: *v1.PodDisruptionBudget"}
2024-10-21T09:49:14.524Z	INFO	controller-runtime.metrics	Serving metrics server	{"bindAddress": ":8080", "secure": false}
2024-10-21T09:49:14.524Z	INFO	Starting EventSource	{"controller": "cassandradatacenter_controller", "controllerGroup": "cassandra.datastax.com", "controllerKind": "CassandraDatacenter", "source": "kind source: *v1.Service"}
2024-10-21T09:49:14.524Z	INFO	Starting EventSource	{"controller": "cassandradatacenter_controller", "controllerGroup": "cassandra.datastax.com", "controllerKind": "CassandraDatacenter", "source": "kind source: *v1.Secret"}
2024-10-21T09:49:14.524Z	INFO	Starting EventSource	{"controller": "cassandradatacenter_controller", "controllerGroup": "cassandra.datastax.com", "controllerKind": "CassandraDatacenter", "source": "kind source: *v1.Secret"}
2024-10-21T09:49:14.524Z	INFO	Starting Controller	{"controller": "cassandradatacenter_controller", "controllerGroup": "cassandra.datastax.com", "controllerKind": "CassandraDatacenter"}
2024-10-21T09:49:14.524Z	INFO	Starting workers	{"controller": "cassandradatacenter_controller", "controllerGroup": "cassandra.datastax.com", "controllerKind": "CassandraDatacenter", "worker count": 1}
2024-10-21T09:49:14.524Z	INFO	Shutdown signal received, waiting for all workers to finish	{"controller": "cassandradatacenter_controller", "controllerGroup": "cassandra.datastax.com", "controllerKind": "CassandraDatacenter"}
2024-10-21T09:49:14.524Z	INFO	All workers finished	{"controller": "cassandradatacenter_controller", "controllerGroup": "cassandra.datastax.com", "controllerKind": "CassandraDatacenter"}
2024-10-21T09:49:14.524Z	INFO	Stopping and waiting for caches
2024-10-21T09:49:14.536Z	ERROR	controller-runtime.source.EventHandler	failed to get informer from cache	{"error": "Timeout: failed waiting for *v1beta1.CassandraDatacenter Informer to sync"}
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1.1
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/source/kind.go:68
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:53
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:54
k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/poll.go:33
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/source/kind.go:56
2024-10-21T09:49:14.537Z	ERROR	controller-runtime.source.EventHandler	failed to get informer from cache	{"error": "Timeout: failed waiting for *v1.PodDisruptionBudget Informer to sync"}
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1.1
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/source/kind.go:68
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:53
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:54
k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/poll.go:33
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/source/kind.go:56
2024-10-21T09:49:14.541Z	ERROR	controller-runtime.source.EventHandler	failed to get informer from cache	{"error": "Timeout: failed waiting for *v1.Secret Informer to sync"}
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1.1
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/source/kind.go:68
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:53
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:54
k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/poll.go:33
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/source/kind.go:56
2024-10-21T09:49:14.541Z	ERROR	controller-runtime.source.EventHandler	failed to get informer from cache	{"error": "Timeout: failed waiting for *v1.Service Informer to sync"}
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1.1
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/source/kind.go:68
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:53
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:54
k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/poll.go:33
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/source/kind.go:56
2024-10-21T09:49:14.543Z	ERROR	controller-runtime.source.EventHandler	failed to get informer from cache	{"error": "Timeout: failed waiting for *v1.StatefulSet Informer to sync"}
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1.1
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/source/kind.go:68
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:53
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:54
k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/poll.go:33
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/source/kind.go:56
2024-10-21T09:49:14.546Z	ERROR	controller-runtime.source.EventHandler	failed to get informer from cache	{"error": "Timeout: failed waiting for *v1alpha1.CassandraTask Informer to sync"}
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1.1
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/source/kind.go:68
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:53
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:54
k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/poll.go:33
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/source/kind.go:56
2024-10-21T09:49:14.546Z	ERROR	controller-runtime.source.EventHandler	failed to get informer from cache	{"error": "Timeout: failed waiting for *v1.Secret Informer to sync"}
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1.1
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/source/kind.go:68
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:53
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:54
k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/poll.go:33
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/source/kind.go:56
2024-10-21T09:49:14.546Z	INFO	Stopping and waiting for webhooks
2024-10-21T09:49:14.546Z	INFO	Stopping and waiting for HTTP servers
2024-10-21T09:49:14.546Z	INFO	controller-runtime.metrics	Shutting down metrics server with timeout of 1 minute
2024-10-21T09:49:14.547Z	INFO	Wait completed, proceeding to shutdown the manager
2024-10-21T09:49:14.547Z	ERROR	setup	problem running manager	{"error": "open /tmp/k8s-webhook-server/serving-certs/tls.crt: no such file or directory"}
main.main
	/workspace/cmd/main.go:157
runtime.main
	/usr/local/go/src/runtime/proc.go:250

It looks for the /tmp/k8s-webhook-server/serving-certs/tls.crt certificate file locally

You can view this document: https://book.kubebuilder.io/cronjob-tutorial/running.html

But now I don't understand that even though I have started cert-manager, it still looks for the certificate file locally? What is the logic here?

What did you expect to happen?

No response

How can we reproduce it (as minimally and precisely as possible)?

Deploy and install Cassandra 4.1.X version and it will appear.

cass-operator version

v1.21.0

Kubernetes version

v1.28.2

Method of installation

helm

Anything else we need to know?

No response

┆Issue is synchronized with this Jira Story by Unito ┆Issue Number: CASS-72

JokerDevops avatar Oct 21 '24 09:10 JokerDevops

cert-manager simply mounts the crt files to the disk (through secret), that's perfectly normal. It seems you have some issues in your Kubernetes environment as even the operator can't reliably connect to it. Perhaps overload?

burmanm avatar Oct 21 '24 10:10 burmanm

cert-manager simply mounts the crt files to the disk (through secret), that's perfectly normal. It seems you have some issues in your Kubernetes environment as even the operator can't reliably connect to it. Perhaps overload?

But when I kept other conditions unchanged, it ran fine when using version 4.0.1 of Cassandra.

JokerDevops avatar Oct 22 '24 02:10 JokerDevops

None of those logs have any indication that it would have even noticed your CassandraDatacenter. That's all startup logs of the operator, not the actual processing instructions.

So the operator doesn't know if you deployed 4.1, 4.0.1 or nothing at all at that stage. So if something succeeded later with 4.0.1, that means the operator had already started correctly. Then it would have done the same for 4.1

burmanm avatar Oct 22 '24 06:10 burmanm

None of those logs have any indication that it would have even noticed your CassandraDatacenter. That's all startup logs of the operator, not the actual processing instructions.

So the operator doesn't know if you deployed 4.1, 4.0.1 or nothing at all at that stage. So if something succeeded later with 4.0.1, that means the operator had already started correctly. Then it would have done the same for 4.1

But this is the log generated by the Cassandra pod, not the log generated by the operator.

JokerDevops avatar Oct 22 '24 11:10 JokerDevops

I saw that in 4.1, the self-developed k8ssandra-client was used. Is this related to this?

JokerDevops avatar Oct 22 '24 11:10 JokerDevops

I tried again and again and got the same error message.

JokerDevops avatar Oct 22 '24 11:10 JokerDevops

That log: 2024-10-21T09:49:14.536Z ERROR controller-runtime.source.EventHandler is from the operator, not from the Cassandra pod.

burmanm avatar Oct 22 '24 12:10 burmanm

Closing this due to inactivity.

burmanm avatar Apr 17 '25 07:04 burmanm