strimzi-kafka-operator
strimzi-kafka-operator copied to clipboard
ScaleUp and ScaleDown is not working with KRaft mode
Describe the bug Kafka scaleUp from 3 pods to 4 is not working with the following error:
STRIMZI_BROKER_ID=0
Preparing truststore for replication listener
Adding /opt/kafka/cluster-ca-certs/ca.crt to truststore /tmp/kafka/cluster.truststore.p12 with alias ca
Certificate was added to keystore
Preparing truststore for replication listener is complete
Looking for the right CA
Found the right CA: /opt/kafka/cluster-ca-certs/ca.crt
Preparing keystore for replication and clienttls listener
Preparing keystore for replication and clienttls listener is complete
Preparing truststore for client authentication
Adding /opt/kafka/client-ca-certs/ca.crt to truststore /tmp/kafka/clients.truststore.p12 with alias ca
Certificate was added to keystore
Preparing truststore for client authentication is complete
Starting Kafka with configuration:
##############################
##############################
# This file is automatically generated by the Strimzi Cluster Operator
# Any changes to this file will be ignored and overwritten!
##############################
##############################
##########
# Broker ID
##########
broker.id=0
node.id=0
##########
# KRaft configuration
##########
process.roles=broker,controller
controller.listener.names=CONTROLPLANE-9090
controller.quorum.voters=0@my-cluster-5261ed90-kafka-0.my-cluster-5261ed90-kafka-brokers.namespace-0.svc.cluster.local:9090,1@my-cluster-5261ed90-kafka-1.my-cluster-5261ed90-kafka-brokers.namespace-0.svc.cluster.local:9090,2@my-cluster-5261ed90-kafka-2.my-cluster-5261ed90-kafka-brokers.namespace-0.svc.cluster.local:9090,3@my-cluster-5261ed90-kafka-3.my-cluster-5261ed90-kafka-brokers.namespace-0.svc.cluster.local:9090
##########
# Kafka message logs configuration
##########
log.dirs=/var/lib/kafka/data/kafka-log0
##########
# Control Plane listener
##########
listener.name.controlplane-9090.ssl.keystore.location=/tmp/kafka/cluster.keystore.p12
listener.name.controlplane-9090.ssl.keystore.password=[hidden]
listener.name.controlplane-9090.ssl.keystore.type=PKCS12
listener.name.controlplane-9090.ssl.truststore.location=/tmp/kafka/cluster.truststore.p12
listener.name.controlplane-9090.ssl.truststore.password=[hidden]
listener.name.controlplane-9090.ssl.truststore.type=PKCS12
listener.name.controlplane-9090.ssl.client.auth=required
##########
# Replication listener
##########
listener.name.replication-9091.ssl.keystore.location=/tmp/kafka/cluster.keystore.p12
listener.name.replication-9091.ssl.keystore.password=[hidden]
listener.name.replication-9091.ssl.keystore.type=PKCS12
listener.name.replication-9091.ssl.truststore.location=/tmp/kafka/cluster.truststore.p12
listener.name.replication-9091.ssl.truststore.password=[hidden]
listener.name.replication-9091.ssl.truststore.type=PKCS12
listener.name.replication-9091.ssl.client.auth=required
##########
# Listener configuration: PLAIN-9092
##########
##########
# Listener configuration: TLS-9093
##########
listener.name.tls-9093.ssl.keystore.location=/tmp/kafka/cluster.keystore.p12
listener.name.tls-9093.ssl.keystore.password=[hidden]
listener.name.tls-9093.ssl.keystore.type=PKCS12
##########
# Common listener configuration
##########
listeners=CONTROLPLANE-9090://0.0.0.0:9090,REPLICATION-9091://0.0.0.0:9091,PLAIN-9092://0.0.0.0:9092,TLS-9093://0.0.0.0:9093
advertised.listeners=REPLICATION-9091://my-cluster-5261ed90-kafka-0.my-cluster-5261ed90-kafka-brokers.namespace-0.svc:9091,PLAIN-9092://my-cluster-5261ed90-kafka-0.my-cluster-5261ed90-kafka-brokers.namespace-0.svc:9092,TLS-9093://my-cluster-5261ed90-kafka-0.my-cluster-5261ed90-kafka-brokers.namespace-0.svc:9093
listener.security.protocol.map=CONTROLPLANE-9090:SSL,REPLICATION-9091:SSL,PLAIN-9092:PLAINTEXT,TLS-9093:SSL
inter.broker.listener.name=REPLICATION-9091
sasl.enabled.mechanisms=
ssl.secure.random.implementation=SHA1PRNG
ssl.endpoint.identification.algorithm=HTTPS
##########
# User provided configuration
##########
default.replication.factor=3
inter.broker.protocol.version=3.2
log.message.format.version=3.2
min.insync.replicas=2
offsets.topic.replication.factor=3
transaction.state.log.min.isr=2
transaction.state.log.replication.factor=3
Kraft storage is already formatted
+ exec /usr/bin/tini -w -e 143 -- /opt/kafka/bin/kafka-server-start.sh /tmp/strimzi.properties
2022-05-25 08:24:32,210 INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$) [main]
2022-05-25 08:24:32,623 INFO Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation (org.apache.zookeeper.common.X509Util) [main]
2022-05-25 08:24:32,845 INFO [LogLoader partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log0] Recovering unflushed segment 0 (kafka.log.LogLoader) [main]
2022-05-25 08:24:32,847 INFO [LogLoader partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log0] Loading producer state till offset 0 with message format version 2 (kafka.log.UnifiedLog$) [main]
2022-05-25 08:24:32,847 INFO [LogLoader partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log0] Reloading from producer snapshot and rebuilding producer state from offset 0 (kafka.log.UnifiedLog$) [main]
2022-05-25 08:24:32,849 INFO Deleted producer state snapshot /var/lib/kafka/data/kafka-log0/__cluster_metadata-0/00000000000000000009.snapshot (kafka.log.SnapshotFile) [main]
2022-05-25 08:24:32,851 INFO [LogLoader partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log0] Producer state recovery took 3ms for snapshot load and 0ms for segment recovery from offset 0 (kafka.log.UnifiedLog$) [main]
2022-05-25 08:24:32,882 INFO [ProducerStateManager partition=__cluster_metadata-0] Wrote producer snapshot at offset 9 with 0 producer ids in 11 ms. (kafka.log.ProducerStateManager) [main]
2022-05-25 08:24:32,916 INFO [LogLoader partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log0] Loading producer state till offset 9 with message format version 2 (kafka.log.UnifiedLog$) [main]
2022-05-25 08:24:32,916 INFO [LogLoader partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log0] Reloading from producer snapshot and rebuilding producer state from offset 9 (kafka.log.UnifiedLog$) [main]
2022-05-25 08:24:32,917 INFO [ProducerStateManager partition=__cluster_metadata-0] Loading producer state from snapshot file 'SnapshotFile(/var/lib/kafka/data/kafka-log0/__cluster_metadata-0/00000000000000000009.snapshot,9)' (kafka.log.ProducerStateManager) [main]
2022-05-25 08:24:32,919 INFO [LogLoader partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log0] Producer state recovery took 3ms for snapshot load and 0ms for segment recovery from offset 9 (kafka.log.UnifiedLog$) [main]
2022-05-25 08:24:33,319 INFO [raft-expiration-reaper]: Starting (kafka.raft.TimingWheelExpirationService$ExpiredOperationReaper) [raft-expiration-reaper]
2022-05-25 08:24:33,519 ERROR Exiting Kafka due to fatal exception (kafka.Kafka$) [main]
java.lang.IllegalStateException: Configured voter set: [0, 1, 2, 3] is different from the voter set read from the state file: [0, 1, 2]. Check if the quorum configuration is up to date, or wipe out the local state file if necessary
at org.apache.kafka.raft.QuorumState.initialize(QuorumState.java:132)
at org.apache.kafka.raft.KafkaRaftClient.initialize(KafkaRaftClient.java:364)
at kafka.raft.KafkaRaftManager.buildRaftClient(RaftManager.scala:203)
at kafka.raft.KafkaRaftManager.<init>(RaftManager.scala:125)
at kafka.server.KafkaRaftServer.<init>(KafkaRaftServer.scala:76)
at kafka.Kafka$.buildServer(Kafka.scala:79)
at kafka.Kafka$.main(Kafka.scala:87)
at kafka.Kafka.main(Kafka.scala)
To Reproduce Steps to reproduce the behavior:
- Setup CO with KRaft enabled
- Create Kafka CR with 3 replicas
- Scale to 4 replicas
- See error in Kafka pod
Expected behavior A clear and concise description of what you expected to happen.
Environment (please complete the following information):
- Strimzi version: main
- Installation method: YAML
- Kubernetes cluster: OpenShift 4.10
- Infrastructure: Openstack
YAML files and logs Kafka with 3 replicas
apiVersion: v1
items:
- apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
annotations:
strimzi.io/pause-reconciliation: "false"
labels:
test.case: testPauseReconciliationInKafkaAndKafkaConnectWithConnector
name: my-cluster-5261ed90
namespace: namespace-0
spec:
kafka:
config:
default.replication.factor: 3
inter.broker.protocol.version: "3.2"
log.message.format.version: "3.2"
min.insync.replicas: 2
offsets.topic.replication.factor: 3
transaction.state.log.min.isr: 2
transaction.state.log.replication.factor: 3
listeners:
- name: plain
port: 9092
tls: false
type: internal
- name: tls
port: 9093
tls: true
type: internal
logging:
loggers:
kafka.root.logger.level: DEBUG
type: inline
replicas: 3
storage:
deleteClaim: true
size: 1Gi
type: persistent-claim
version: 3.2.0
zookeeper:
logging:
loggers:
zookeeper.root.logger: DEBUG
type: inline
replicas: 3
storage:
deleteClaim: true
size: 1Gi
type: persistent-claim
Kafka with 4 replicas
apiVersion: v1
items:
- apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
annotations:
strimzi.io/pause-reconciliation: "false"
creationTimestamp: "2022-05-25T08:16:08Z"
generation: 2
labels:
test.case: testPauseReconciliationInKafkaAndKafkaConnectWithConnector
name: my-cluster-5261ed90
namespace: namespace-0
resourceVersion: "14487706"
uid: d7843a81-0409-4769-8858-7ad8d6943a2a
spec:
kafka:
config:
default.replication.factor: 3
inter.broker.protocol.version: "3.2"
log.message.format.version: "3.2"
min.insync.replicas: 2
offsets.topic.replication.factor: 3
transaction.state.log.min.isr: 2
transaction.state.log.replication.factor: 3
listeners:
- name: plain
port: 9092
tls: false
type: internal
- name: tls
port: 9093
tls: true
type: internal
logging:
loggers:
kafka.root.logger.level: DEBUG
type: inline
replicas: 4
storage:
deleteClaim: true
size: 1Gi
type: persistent-claim
version: 3.2.0
zookeeper:
logging:
loggers:
zookeeper.root.logger: DEBUG
type: inline
replicas: 3
storage:
deleteClaim: true
size: 1Gi
type: persistent-claim
status:
conditions:
- lastTransitionTime: "2022-05-25T08:20:30.678Z"
message: Error while waiting for restarted pod my-cluster-5261ed90-kafka-0 to
become ready
reason: FatalProblem
status: "True"
type: NotReady
observedGeneration: 2