strimzi-kafka-operator icon indicating copy to clipboard operation
strimzi-kafka-operator copied to clipboard

Topic operator auto regenerate topic after deleting

Open nautiam opened this issue 2 years ago • 24 comments

Describe the bug I create a Kafka cluster with Topic Operator. Then I create "my-topic" by using Kafka Topic crd. But when I use CLI to delete topic, it will be regenerate.

To Reproduce Steps to reproduce the behavior:

  1. Create Custom Resource 'Kafka'
  2. Create Custom Resource 'KafkaTopic'
  3. Go to Zookeeper pod
  4. Run command '$ bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic my-topic'
  5. Kafka Topic 'my-topic' is auto regenerated

Expected behavior Topic is deleted and will not be re generated.

Environment (please complete the following information):

  • Strimzi version: 0.25.0
  • Installation method: [e.g. YAML files, Helm chart, OperatorHub.io]
  • Kubernetes cluster: OpenShift 4.9
  • Infrastructure: Baremetal

YAML files and logs

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
spec:
  entityOperator:
    topicOperator: {}
    userOperator: {}
  kafka:
    authorization:
      type: simple
    config:
      inter.broker.protocol.version: '2.8'
      log.message.format.version: '2.8'
      transaction.state.log.min.isr: 2
      replica.fetch.max.bytes: 41943040
      max.message.bytes: 10485760
      offsets.topic.replication.factor: 3
    listeners:
      - authentication:
          type: scram-sha-512
        name: plain
        port: 9092
        tls: false
        type: internal
      - authentication:
          type: scram-sha-512
        name: tls
        port: 9093
        tls: true
        type: internal
      - authentication:
          type: scram-sha-512
        name: external
        port: 9094
        tls: true
        type: route
    metricsConfig:
      type: jmxPrometheusExporter
      valueFrom:
        configMapKeyRef:
          key: kafka-metrics-config.yml
          name: kafka-metrics
    replicas: 3
    storage:
      class: nfs
      deleteClaim: false
      size: 5Gi
      type: persistent-claim
    version: 2.8.0
  kafkaExporter:
    groupRegex: .*
    topicRegex: .*
  zookeeper:
    metricsConfig:
      type: jmxPrometheusExporter
      valueFrom:
        configMapKeyRef:
          key: zookeeper-metrics-config.yml
          name: kafka-metrics
    replicas: 3
    storage:
      class: nfs
      deleteClaim: false
      size: 5Gi
      type: persistent-claim
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: my-topic
  labels:
    strimzi.io/cluster: my-cluster
spec:
  config: {}
  partitions: 10
  replicas: 3
  topicName: my-topic

Additional context After regeneration, both partitions and replicas of topic are 1.

nautiam avatar Dec 05 '21 05:12 nautiam

It looks like your cluster does not have disabled topic auto-creation. So you have to make sure there are no clients using the topic when you delete it, otherwise they would just recreated it with the default settings (1 partition and 1 replica) when they consume / produce. There are several older issues and discussions about this –> please have a look at them.

PS: NFS storage does not work with Kafka, you should use block storage.

scholzj avatar Dec 05 '21 10:12 scholzj

It looks like your cluster does not have disabled topic auto-creation. So you have to make sure there are no clients using the topic when you delete it, otherwise they would just recreated it with the default settings (1 partition and 1 replica) when they consume / produce.

I'm pretty sure that there are no clients using the topic. I even tried to create some totally new topics by using KafkaTopic and repeat all the steps, and I get the same issue. The topic is auto regenerated if I deleted it by using CLI or java code.

nautiam avatar Dec 05 '21 11:12 nautiam

Well, if the operator recreates it, it would do it with the original settings. There is an easy test for it: disable the topic auto-creation, wait until the brokers roll and try it again. But if it does not help and you think the operator does it, then it would be great if you can provide a DEBUG log from the topic operator to show what it is doing and why.

scholzj avatar Dec 05 '21 11:12 scholzj

I tried to remove the Topic Operator when creating Kafka cluster, and now I can delete topic without regeneration.

nautiam avatar Dec 05 '21 14:12 nautiam

Hi @nautiam , I tried your use case and it works as expected:

When I use the following command:

$ kubectl exec my-cluster-kafka-0 -- bin/kafka-topics.sh --bootstrap-server :9092 --topic my-topic --delete

The ZK watcher is notified as soon as the delete operation returns and triggers a new reconciliation:

2021-12-06 08:32:17,69936 INFO  [ZkClient-EventThread-20-localhost:2181] ZkTopicsWatcher:126 - Topics deleted from ZK for watch 1: [my-topic]
2021-12-06 08:32:17,70834 INFO  [ZkClient-EventThread-20-localhost:2181] ZkTopicsWatcher:142 - Topics created in ZK for watch 1: []
2021-12-06 08:32:18,72096 INFO  [vert.x-eventloop-thread-1] TopicOperator:576 - Reconciliation #506(/brokers/topics 1:-my-topic) KafkaTopic(test/my-topic): Reconciling topic my-topic, k8sTopic:nonnull, kafkaTopic:null, privateTopic:nonnull
2021-12-06 08:32:18,73110 INFO  [OkHttp https://10.96.0.1/...] K8sTopicWatcher:56 - Reconciliation #510(kube =my-topic) KafkaTopic(test/my-topic): event MODIFIED on resource my-topic generation=4, labels={strimzi.io/cluster=my-cluster}
2021-12-06 08:32:19,36817 INFO  [OkHttp https://10.96.0.1/...] K8sTopicWatcher:56 - Reconciliation #513(kube -my-topic) KafkaTopic(test/my-topic): event DELETED on resource my-topic generation=4, labels={strimzi.io/cluster=my-cluster}
2021-12-06 08:32:19,76860 INFO  [vert.x-eventloop-thread-1] K8sTopicWatcher:60 - Reconciliation #530(kube =my-topic) KafkaTopic(test/my-topic): Success processing event MODIFIED on resource my-topic with labels {strimzi.io/cluster=my-cluster}
2021-12-06 08:32:19,77826 INFO  [vert.x-eventloop-thread-1] TopicOperator:576 - Reconciliation #536(kube -my-topic) KafkaTopic(test/my-topic): Reconciling topic null, k8sTopic:null, kafkaTopic:null, privateTopic:null
2021-12-06 08:32:19,77907 INFO  [vert.x-eventloop-thread-1] K8sTopicWatcher:60 - Reconciliation #540(kube -my-topic) KafkaTopic(test/my-topic): Success processing event DELETED on resource my-topic with labels {strimzi.io/cluster=my-cluster}

And this is the end result:

$ kubectl get kt
NAME                                                                                               CLUSTER      PARTITIONS   REPLICATION FACTOR   READY
consumer-offsets---84e7a678d08f4bd226872e5cdd4eb527fadc1c6a                                        my-cluster   50           3                    True
strimzi-store-topic---effb8e3e057afce1ecf67c3f5d8e4e3ff177fc55                                     my-cluster   1            1                    True
strimzi-topic-operator-kstreams-topic-store-changelog---b75e702040b99be8a9263134de3507fc0cc4017b   my-cluster   1            1                    True

You can check what happens in your topic-operator logs when you delete that topic and compare with mine.

fvaleri avatar Dec 06 '21 08:12 fvaleri

I tried to remove the Topic Operator when creating Kafka cluster, and now I can delete topic without regeneration.

I'm facing with the same problem of recreating topics after deleting them (both KafkaTopic and topic) in one of my Kafka clusters. auto topics creation is unable but the topics still get created (with the unique specs configured in spec.kafka.config of that Kafka cluster). can you explain what exactly you did in order to make it work fine?

AvihuHenya avatar Mar 22 '22 11:03 AvihuHenya

I'm facing with the same problem of recreating topics after deleting them (both KafkaTopic and topic) in one of my Kafka clusters. auto topics creation is unable but the topics still get created (with the unique specs configured in spec.kafka.config of that Kafka cluster). can you explain what exactly you did in order to make it work fine?

At first, do not use topicOperator

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
spec:
  entityOperator:
    userOperator: {}

Then, write code connect to Kafka Cluster and delete the topic.

nautiam avatar Mar 22 '22 14:03 nautiam

@nautiam you don't necessarily need to write code in order to delete a topic. You can simply use the kafka-topics.sh, which is included in the official Apache Kafka distribution.

fvaleri avatar Mar 22 '22 14:03 fvaleri

I'm facing with the same problem of recreating topics after deleting them (both KafkaTopic and topic) in one of my Kafka clusters. auto topics creation is unable but the topics still get created (with the unique specs configured in spec.kafka.config of that Kafka cluster). can you explain what exactly you did in order to make it work fine?

At first, do not use topicOperator

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
spec:
  entityOperator:
    userOperator: {}

Then, write code connect to Kafka Cluster and delete the topic.

Interesting. but it's not the ideal solution for most of the users who wants the topic operator capabilities. Someone has another idea?

AvihuHenya avatar Mar 22 '22 15:03 AvihuHenya

@nautiam you don't necessarily need to write code in order to delete a topic. You can simply use the kafka-topics.sh, which is included in the official Apache Kafka distribution.

Yes, it might be an option.

Interesting. but it's not the ideal solution for most of the users who wants the topic operator capabilities. Someone has another idea?

I think that we have to update the code of TopicOperator to fix this issue. When I read the code of Kafka delete topic function, I found that it's the future function. It means that although when we call delete topic function, it return success, but it is still deleting the topic actually. And if we call any function related to topic while it's deleting, such as list topic or get topic config, the Kafka cluster will auto generate the topic with default replication. I don't know what exactly the Strimzi code does, but I guess this is the reason. This is why if we create the topic and delete the topic with TopicOperator, we don't meet any issue, but if we create the topic, then push many records to topic for a while, then we delete topic using TopicOperator, we might meet this issue.

nautiam avatar Mar 22 '22 15:03 nautiam

👍 I also have this problem. delete.topic.enabled is true, there is no traffic in my topic, and it get's reversed by the TO. Strangely, it does work sometimes. Here are the TO logs after deleting a topic named test-rr-andrew.

2022-04-18 12:30:01,45314 INFO  [OkHttp https://172.20.0.1/...] K8sTopicWatcher:56 - Reconciliation #13538395(kube -test-rr-andrew) KafkaTopic(kafka/test-rr-andrew): event DELETED on resource test-rr-andrew generation=1, labels={strimzi.io/cluster=event-tracking}
2022-04-18 12:30:01,48327 INFO  [vert.x-eventloop-thread-1] TopicOperator:576 - Reconciliation #13538401(kube -test-rr-andrew) KafkaTopic(kafka/test-rr-andrew): Reconciling topic test-rr-andrew, k8sTopic:null, kafkaTopic:nonnull, privateTopic:nonnull
2022-04-18 12:30:01,48346 INFO  [vert.x-eventloop-thread-1] TopicOperator:372 - Reconciliation #13538404(kube -test-rr-andrew) KafkaTopic(kafka/test-rr-andrew): Deleting topic 'test-rr-andrew'
2022-04-18 12:30:01,58894 INFO  [__strimzi-topic-operator-kstreams-f9f1b9a3-65ac-4f53-91a9-5e2f0f379902-StreamThread-1] K8sTopicWatcher:60 - Reconciliation #13538410(kube -test-rr-andrew) KafkaTopic(kafka/test-rr-andrew): Success processing event DELETED on resource test-rr-andrew with labels {strimzi.io/cluster=event-tracking}
2022-04-18 12:30:01,60149 INFO  [ZkClient-EventThread-20-localhost:2181] ZkTopicsWatcher:126 - Topics deleted from ZK for watch 66: [test-rr-andrew]
2022-04-18 12:30:01,60189 INFO  [ZkClient-EventThread-20-localhost:2181] ZkTopicsWatcher:142 - Topics created in ZK for watch 66: []
2022-04-18 12:30:02,61090 INFO  [vert.x-eventloop-thread-0] TopicOperator:576 - Reconciliation #13538422(/brokers/topics 66:-test-rr-andrew) KafkaTopic(kafka/test-rr-andrew): Reconciling topic null, k8sTopic:null, kafkaTopic:null, privateTopic:null
2022-04-18 12:30:18,09430 INFO  [ZkClient-EventThread-20-localhost:2181] ZkTopicsWatcher:126 - Topics deleted from ZK for watch 67: []
2022-04-18 12:30:18,09446 INFO  [ZkClient-EventThread-20-localhost:2181] ZkTopicsWatcher:142 - Topics created in ZK for watch 67: [test-rr-andrew]
2022-04-18 12:30:19,10782 INFO  [vert.x-eventloop-thread-1] TopicOperator:576 - Reconciliation #13538440(/brokers/topics 67:+test-rr-andrew) KafkaTopic(kafka/test-rr-andrew): Reconciling topic test-rr-andrew, k8sTopic:null, kafkaTopic:nonnull, privateTopic:null
2022-04-18 12:30:19,11918 INFO  [OkHttp https://172.20.0.1/...] K8sTopicWatcher:56 - Reconciliation #13538443(kube +test-rr-andrew) KafkaTopic(kafka/test-rr-andrew): event ADDED on resource test-rr-andrew generation=1, labels={strimzi.io/cluster=event-tracking}
2022-04-18 12:30:19,13898 INFO  [kubernetes-ops-pool-14] CrdOperator:113 - Reconciliation #13538454(/brokers/topics 67:+test-rr-andrew) KafkaTopic(kafka/test-rr-andrew): Status of KafkaTopic test-rr-andrew in namespace kafka has been updated
2022-04-18 12:30:19,14535 INFO  [vert.x-eventloop-thread-1] TopicOperator:576 - Reconciliation #13538463(kube +test-rr-andrew) KafkaTopic(kafka/test-rr-andrew): Reconciling topic test-rr-andrew, k8sTopic:nonnull, kafkaTopic:nonnull, privateTopic:nonnull
2022-04-18 12:30:19,14540 INFO  [vert.x-eventloop-thread-1] TopicOperator:743 - Reconciliation #13538468(kube +test-rr-andrew) KafkaTopic(kafka/test-rr-andrew): All three topics are identical
2022-04-18 12:30:19,14569 INFO  [vert.x-eventloop-thread-1] K8sTopicWatcher:60 - Reconciliation #13538473(kube +test-rr-andrew) KafkaTopic(kafka/test-rr-andrew): Success processing event ADDED on resource test-rr-andrew with labels {strimzi.io/cluster=event-tracking}

ryanjclark avatar Apr 18 '22 12:04 ryanjclark

I experience the same problem. Topics are regenerated with the original configuration. To me it looks like the two-way sync of the topic operator cannot keep up with the deletion of the KafkaTopic resources and regenerates them because it thinks they're missing. I only experience this when deleting many KafkaTopics at once. For instance, I just deleted 446 resources and 157 got regenerated.

chaehni avatar May 12 '22 12:05 chaehni

I experience the same problem. Topics are regenerated with the original configuration. To me it looks like the two-way sync of the topic operator cannot keep up with the deletion of the KafkaTopic resources and regenerates them because it thinks they're missing. I only experience this when deleting many KafkaTopics at once. For instance, I just deleted 446 resources and 157 got regenerated.

As with others, please meke sure you have the topic autocreation disabledas that can cause all kind of issues. But yes, there seems to be some bug which causes this to happen even without the auto-creation.

scholzj avatar May 12 '22 12:05 scholzj

@chaehni I can confirm that.

I can reproduce the issue you describe consistently when running a load test which creates a bunch of test topics (e.g. 20). After a successful test run, I do a bulk topic deletion to get rid of these test topics and I hit the issue. The TO recreate them almost immediately with the same configuration but empty (the topicId is different).

I guess we are trigger some reconciliation logic edge case here, which needs to be investigated further. At least we seem to have a reproducer.

Possible workaround: if you look at the TO logs, you may find InvalidStateStoreException warnings. In my case, I found that simply restarting the TO pod before the bulk topic deletion fixes the issue.

fvaleri avatar Jun 16 '22 10:06 fvaleri

@scholzj , but is there a need for TO to be present at all ? , I have the auto.create enabled for topics , I simply cannot disable because my setup is like a central kafka cluster where multiple environments connect to it , topics are just prefixed by namespace , i cannot simply create each topic for every environment , i am thinking of removing TO completely because as part of removing the environment i delete the topics as well , but this TO is saving them and in the end recreating them everytime i delete the topics which are no longer needed for me ,

hari819 avatar Jul 06 '22 15:07 hari819

@hari819 Honestly, sounds like a very bad practice. How do you know what is inside the topics? How do you track which topics are actually needed and which were just created by some mistake? How do you track that the topics have the right settings? Sounds like a mess to me. Normally, disabling the topic auto-creation and have some central management sounds like a day one thing. (regardless whether you use Topic Operator for the management - its more about autocreation being a mess rather than TO being something super-amazing)

That said, the Topic Operator is optional. So you can easily disable it -> just remove the topicOperator section from the Kafka custom resource.

scholzj avatar Jul 06 '22 16:07 scholzj

@scholzj , yes i get it that i wont be able to run "k get KT" and all those commands because once I removed TO , i do not see anything in the CR ,KafkaTopics , but i will make this change only in my development/testing Kafka cluster , the problem is we have nearly 250 odd environments up all time(with 30K topics making the cluster bulky) , we do have housekeeping job to take care of topic deletion during when a member finishes his dev/test along with deleting the environment itself , but my PVCs are getting full once a week because of TO reconciling the topics , and most important thing is once testing/development is done there is no use of the topics and their data , the next time developer creates a new namespace , the topics will get created on the central kafka cluster with a different prefix , so i am just deleting the data which is not required at all , I would like to keep the TO in place for all other environments like PREPROD/PROD , thanks for the quick response and the suggestions.

hari819 avatar Jul 07 '22 03:07 hari819

@scholzj , please can you suggest if there is an alternative for autocreation of topics in such a scenario with these many environments ?

hari819 avatar Jul 07 '22 03:07 hari819

@scholzj

and now when i remove TO from the cluster definition , i am not able to delete topics at all , Odd , I am just running this command in loop to remove the topics for each environment ,

${KAFKA_HOME}/bin/kafka-topics.sh --bootstrap-server kafka-service:9092 --delete --topic $topic-name

the same command was able to delete topics when TO was enabled , only problem was TO was reconciling the deleted topics ,

I am stuck with this now , I am on version 0.27.1 with Kafka version 2.8.1

hari819 avatar Jul 07 '22 04:07 hari819

yes i get it that i wont be able to run "k get KT" and all those commands because once I removed TO , i do not see anything in the CR ,KafkaTopics

No, that is not what I'm saying. You will have no idea about the topics because you will not know which is auto-created by mistake or by some random one-of app or which is created intentionally and actually used. This has nothing to do with whether you can do kubectl get kafkatopics or not.

scholzj avatar Jul 07 '22 08:07 scholzj

@scholzj , yes if we remove TO ,we are missing info about topics , who is the owner , and all valuable info . I am thinking i should put back and have the automatic topic creation disabled , seems like no other go for me ,

hari819 avatar Jul 07 '22 11:07 hari819

Kafka has its own APIs to wotk with topics. You can list them using those APIs even without the Topic Operator. There are also many other tools for managing topics.

scholzj avatar Jul 07 '22 15:07 scholzj

Kafka has its own APIs to wotk with topics. You can list them using those APIs even without the Topic Operator. There are also many other tools for managing topics.

thankyou @scholzj , i will try to come up with a generic utility container which will take care of the topic creation using the APIs

hari819 avatar Jul 08 '22 06:07 hari819

Triaged on 2.8.2022: There are similar reports, so we should keep this as a bug. It is currently not clear what is causing it.

scholzj avatar Aug 02 '22 15:08 scholzj

@scholzj is there any information we can provide that would help show what is causing the issue?

douglasawh avatar Nov 08 '22 22:11 douglasawh

I guess @tombentley would be the expert on Topic Operator who might know.

scholzj avatar Nov 08 '22 23:11 scholzj

Any news on this? Currently facing the same issue, I delete a couple KafkaTopics k8s resources and they get recreated with default config (1 partition).

I have auto create topics enabled, however I can see that those topics have no produce or consume activity

miguel-cardoso-mindera avatar Sep 26 '23 10:09 miguel-cardoso-mindera

You should disable the auto-creation then.

scholzj avatar Sep 26 '23 16:09 scholzj

The Bidirectional Topic Operator (BTO) has been replaced by the new Unidirectional Topic Operator from Strimzi 0.39. There are no plans to fix any outstanding issues in the old BTO and this issue can be closed.

scholzj avatar Jan 04 '24 19:01 scholzj