strimzi-kafka-operator icon indicating copy to clipboard operation
strimzi-kafka-operator copied to clipboard

move KafkaConnect and MirrorMaker 1.0/2.0 to statefulsets

Open arnitolog opened this issue 3 years ago • 6 comments

Statefulset allows having predictable POD names. That, in turn, allows using pod name as group.instance.id and enable static membership that was designed for cloud applications.

arnitolog avatar Mar 30 '21 14:03 arnitolog

@scholzj Is this a valid issue for us? AFAIU we are moving away from SS.

sknot-rh avatar Aug 06 '21 07:08 sknot-rh

Well, you can also see it as move away from deployment. So I would keep it open, but not start working on it until after we deal with statefulsets.

scholzj avatar Aug 06 '21 08:08 scholzj

@scholzj, @sknot-rh what do you mean by saying "moving away from SS."?

arnitolog avatar Aug 06 '21 08:08 arnitolog

The plan is to not use StatefulSets anymore and manage the pods directly. In some areas, the StatefulSets are very limiting.

scholzj avatar Aug 06 '21 08:08 scholzj

ok, thanks for the info

arnitolog avatar Aug 06 '21 08:08 arnitolog

Triaged on 12.4.2022: This makes sense. The idea is that rolling updates of the Deployments spawn new nodes with new addresses which causes problems with scheduling of connect tasks as Connect sees it as new workers instead of restart of the old worker. With StrimziPodSets this might be improved and the nodes might have a fixed address and might really just restart instead of creating new node. This should be done only after we move away from StatefulSets to StrimziPodSets, we probably do not want to introduce Statefulsets here at this point.

scholzj avatar Apr 12 '22 14:04 scholzj

This is still an issue with StrimziPodSets, when a k8s cluster is upgraded, the nodes will change with rolling update hence the brokers will move around. There is no way to uniquely set the group.instance.id for each KafkaConnect pod forming the Connect deployment.

kinihun avatar Oct 12 '22 10:10 kinihun

This is still an issue with StrimziPodSets, when a k8s cluster is upgraded, the nodes will change with rolling update hence the brokers will move around. There is no way to uniquely set the group.instance.id for each KafkaConnect pod forming the Connect deployment.

I'm not sure I follow what do you mean. Connect does not use StrimziPodSets. That is just future plan. If you think why that would nto help, please elaborate more on why because it is not obvious.

scholzj avatar Oct 12 '22 11:10 scholzj

This is still an issue with StrimziPodSets, when a k8s cluster is upgraded, the nodes will change with rolling update hence the brokers will move around. There is no way to uniquely set the group.instance.id for each KafkaConnect pod forming the Connect deployment.

I'm not sure I follow what do you mean. Connect does not use StrimziPodSets. That is just future plan. If you think why that would nto help, please elaborate more on why because it is not obvious.

Apologies for the lack of clarity, the issue I am observing is quite similar and it has to do with KafkaConnect. As you can see from the error below, the Group coordinator is fixed hence the consumer cannot locate it after a cluster upgrade involving nodes undergoing rolling upgrade. I am using Strimzi 0.30.0 with Kafka version 3.2.0 hence why I mentioned StrimziPodSet which is now default. This issue is preventing my Connector tasks from executing at polling intervals.

"2022-10-11 09:28:28,738 INFO [Consumer clientId=consumer-prod-kafka-1, groupId=prod-kafka] Discovered group coordinator prod-kafka-kafka-1.prod-kafka-kafka-brokers.kafka.svc:9093 (id: 2147483646 rack: null) (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [KafkaBasedLog Work Thread - prod-kafka-connect-offsets]" "2022-10-11 09:28:28,675 INFO [Consumer clientId=consumer-prod-kafka-2, groupId=prod-kafka] Discovered group coordinator prod-kafka-kafka-1.prod-kafka-kafka-brokers.kafka.svc:9093 (id: 2147483646 rack: null) (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [KafkaBasedLog Work Thread - prod-kafka-connect-status]" "2022-10-11 09:28:28,637 INFO [Consumer clientId=consumer-prod-kafka-1, groupId=prod-kafka] Discovered group coordinator prod-kafka-kafka-1.prod-kafka-kafka-brokers.kafka.svc:9093 (id: 2147483646 rack: null) (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [KafkaBasedLog Work Thread - prod-kafka-connect-offsets]" "2022-10-11 09:28:28,637 INFO [Consumer clientId=consumer-prod-kafka-1, groupId=prod-kafka] Group coordinator prod-kafka-kafka-1.prod-kafka-kafka-brokers.kafka.svc:9093 (id: 2147483646 rack: null) is unavailable or invalid due to cause: coordinator unavailable.isDisconnected: false. Rediscovery will be attempted. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [KafkaBasedLog Work Thread - prod-kafka-connect-offsets]" "2022-10-11 09:28:28,637 INFO [Consumer clientId=consumer-prod-kafka-1, groupId=prod-kafka] Requesting disconnect from last known coordinator prod-kafka-kafka-1.prod-kafka-kafka-brokers.kafka.svc:9093 (id: 2147483646 rack: null) (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [KafkaBasedLog Work Thread - prod-kafka-connect-offsets]" "2022-10-11 09:28:28,636 INFO [Consumer clientId=consumer-prod-kafka-1, groupId=prod-kafka] Node 2147483646 disconnected. (org.apache.kafka.clients.NetworkClient) [KafkaBasedLog Work Thread - prod-kafka-connect-offsets]"

kinihun avatar Oct 12 '22 13:10 kinihun

I do not think this has anything to do with this issue. The coordinator simply changes while the Kafka broker restarts. If you have something more, please start a discussion or something and we can continue there: https://github.com/strimzi/strimzi-kafka-operator/discussions

scholzj avatar Oct 12 '22 13:10 scholzj

Move of Connect and MM2 to StirmziPodSets was done in #8090 as part of a new feature gate. MM1 is deprecated and will be removed in the future, so we do not plan any changes to that.

scholzj avatar Feb 17 '23 14:02 scholzj