strimzi-kafka-operator
strimzi-kafka-operator copied to clipboard
InvalidStateException on topic after upgrade to Strimzi 0.29.0 and Kafka 3.2
Describe the bug I recently completed an upgrade from Strimzi version 0.27.1 and Kafka 2.8 to Strimzi 0.29.0 and Kafka 3.2. My existing topics seem to be working correctly, but when I try to create a new topic after the upgrade, the topic never becomes ready with a reason of "InvalidStateException." My attempts to delete the topic have also failed.
To Reproduce Steps to reproduce the behavior:
- Start on Strimzi 0.27.1 and Kafka 2.8.
- Upgrade to Strimzi 0.29.0 and Kafka 3.2 using the documented instructions.
- Attempted to deploy the following Kafka topic:
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
name: deletethis
labels:
strimzi.io/cluster: demo
spec:
partitions: 10
replicas: 3
config:
retention.ms: 259200000
segment.bytes: 1073741824
Result: The topic is created, but the status states NotReady with a reason of InvalidStateException.
The topic operator says the following in the logs:
INFO [vert.x-eventloop-thread-1] TopicOperator:577 - Reconciliation #1811754(periodic kube deletethis) KafkaTopic(kafka/deletethis): Reconciling topic deletethis, k8sTopic: nonnull, kafkaTopic: nonnull, privateTopic: null
WARN [vert.x-eventloop-thread-1] TopicOperator:134 - Reconciliation #1811983(periodic kube deletethis) KafkaTopic(kafka/deletethis): io.strimzi.operator.topic.TopicStore$InvalidStateException
Expected behavior The topic to be created appropriately in the ready state and the ability to delete the topic.
Environment (please complete the following information):
- Strimzi version: 0.29.0
- Installation method: Helm chart
- Kubernetes cluster: Kubernetes 1.20, Rancher v2.5.8
- Infrastructure: VMWare
YAML files and logs
Cluster:
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: demo
spec:
kafka:
version: 3.2.0
replicas: 3
listeners:
- name: tls
port: 9093
type: internal
tls: true
authentication:
type: tls
configuration:
brokerCertChainAndKey:
secretName: brokercert
certificate: broker.crt
key: broker.key
- name: external
port: 9094
type: ingress
tls: true
authentication:
type: tls
configuration:
bootstrap:
host: "demo.kafka.SERVERNAMEGOESHERE"
brokers:
- broker: 0
host: "broker-0.demo.kafka.SERVERNAMEGOESHERE"
- broker: 1
host: "broker-1.demo.kafka.SERVERNAMEGOESHERE"
- broker: 2
host: "broker-2.demo.kafka.SERVERNAMEGOESHERE"
brokerCertChainAndKey:
secretName: brokercert
certificate: broker.crt
key: broker.key
authorization:
type: simple
config:
offsets.topic.replication.factor: 2
transaction.state.log.replication.factor: 1
transaction.state.log.min.isr: 1
inter.broker.protocol.version: "3.2"
storage:
type: persistent-claim
class: nas-store
size: 10Gi
zookeeper:
replicas: 3
storage:
type: persistent-claim
class: nas-store
size: 10Gi
deleteClaim: false
entityOperator::
topicOperator: {}
Please share the full log from the Topic Operator. Without it, there is no chance to actually understand the issue. Also, please check existing issues, there might similar ones already.
Certainly. I have to go through an approval process to get the log file, but I will post it as soon as I can.
Additionally to add -- I just restarted the topic operator and now all my topics are reported as InvalidStateException.
Sounds like something is wrong with the Topic Operator internal store. But without the logs it is hard to say what. Deleting the Topic Operator internal topics might help to recover. But its just a guess at this point.
PS: This might be one of the duplicates: #6671
Thank you for pointing out that duplicate! Although the error message in my log was not the exact one as what was listed in that particular post, the issue was the same.
We started out originally using Strimzi 0.25.0, so as per that issue, our internal topics were created with a replica of 1. During my attempts at upgrading from 0.27.1 to 0.29.0 I was playing around with putting the min.insync.replicas setting of 2 on the cluster and thought I removed it, but I double-checked after your post and it was still set within Kubernetes. Removing min.insync.replicas fixed the issue and everything works again. I will work on increasing the number of replicas on those topics appropriately so I can set the min.insync.replicas value.
I also got approval to post the log file, so here it is attached for completeness. kafka-entity-operator.txt
Closing as duplicate of #6671.