strimzi-kafka-operator icon indicating copy to clipboard operation
strimzi-kafka-operator copied to clipboard

Protect KafkaTopic resorces from switching between different Kafka clusters using cluster or topic IDs

Open joshsouza opened this issue 1 year ago • 7 comments

Bug Description

In a scenario where you have two Kafka clusters within the same namespace, both with the Entity operator running and synchronizing KafkaTopic objects, the first cluster to spin up will "win" ownership of the KafkaTopic k8s resource for the same topic. This results in three significant issues:

  1. Synchonization errors - While both clusters are running, the second cluster will throw Error reconciling KafkaTopic and io.fabric8.kubernetes.client.KubernetesClientException errors due to attempting to create a new resource when one already exists.
  2. Data loss conditions - If a topic with the same name exists in two different clusters, and the KafkaTopic, labelled as belonging to one cluster is deleted, the topic will be deleted from both clusters.
  3. Data loss conditions - (basically the same situation as above) If the first cluster is deleted, the KafkaTopic resources remain in k8s, labelled as belonging to the now-gone cluster. If you delete these (which would appear to be a safe operation) the next synchronization of the Entity operator will delete those same topics in its cluster (even though it was never able to properly synchronize, as noted above).

The end result is that administrators performing what could be perceived as a safe "tidy-up" operation of resources associated with a deleted cluster could inadvertently delete real data on a live cluster.

This is also highly dangerous if you were to have a staging and production cluster (or consider in an A/B rollout, two production clusters) and a user were to delete a KafkaTopic associated with one of the two clusters, it can have an impact on the other.

We have noticed that Cruise Control and Connectors frequently use the same topic names, and in our own use, we use the same topic names for our applications, and could be impacted by this.

One potential remedy would be to ensure that k8s resources are guaranteed to be uniquely named. Another would be to put checks in place such that synchronizing a delete from a k8s resource respects the strimzi.io/cluster label, and does not process a delete from a different kafka cluster's KafkaTopic resources. I'm sure there's other solutions as well.

Steps to reproduce

  1. Spin up a fresh Kafka cluster, named first-cluster, let it come fully up. (Ensure you have the topicOperator enabled. It may be helpful to deploy a Kafka UI or similar tooling so that you can interact with the cluster)
  2. Create two new topics from within Kafka itself (I.E. via the UI or a CLI, not via the KafkaTopic resource) named shared-topic and remnant-topic
  3. Observe that the shared-topic and remnant-topic KafkaTopic resources are generated automatically on the Entity operator's next synchronization, and that they have the label strimzi.io/cluster: first-cluster
  4. Spin up a fresh Kafka cluster in the same namespace, named second-cluster, let it come fully up. (Identical configuration to the first-cluster)
  5. Create the same two new topics from within Kafka itself named shared-topic and remnant-topic on the second-cluster
  6. FIRST ISSUE: Observe the logs of the Entity operator for the second-cluster. You will see errors relating to synchronizing topics
  7. Delete the shared-topic KafkaTopic resource via kubectl
  8. SECOND ISSUE: Observe that the shared-topic topic is deleted from both clusters, rather than only the first-cluster as the labels would indicate (It may take some time for this to propagate, as it occurs on next synchronization)
  9. Delete the first-cluster Kafka resource, wait for it to be fully cleaned up
  10. Observe that the remnant-topic KafkaTopic remains (as well as any other topics that may have been auto-synced) with the label strimzi.io/cluster: first-cluster
  11. Observe that the errors within the Entity operator for second-cluster continue to occur
  12. Delete the remnant-topic KafkaTopic
  13. THIRD ISSUE: Observe the logs in the Entity operator during next sync, the topic will be deleted
  14. THIRD ISSUE: Observe that the example-topic has been deleted from second-cluster

Expected behavior

  1. Deploying multiple Strimzi Kafka resources in the same namespace should be possible without interference
  2. No errors should occur in the Entity operator logs in situations where two different Kafka clusters have the same topics
  3. Deleting a KafkaTopic whose labels indicate the topic belongs to one cluster should have no impact on another cluster. (whether or not that cluster remains online)

Strimzi version

0.33.0

Kubernetes version

1.23.17

Installation method

helm

Infrastructure

Any

Configuration files and logs

No response

Additional context

I recognize that multiple Kafka resources in the same namespace may not be the "ideal" setup with Strimzi, but I would still classify this as a bug, as I'd gamble that you could potentially reproduce this issue by creating a cluster, creating the topic, deleting the cluster (leaving the KafkaTopic around) spinning up a new cluster, creating the topic manually in that one, and then subsequently deleting the k8s resource.

joshsouza avatar Sep 18 '23 19:09 joshsouza

This is more or less expected. The general recommendation is to:

  • Have a different namespace for each Kafka cluster
  • Enable the Topic Operator only in one of them
  • Configure the Topic Operators to each watch different namespace

In any case, this is solved in the Unidirectional Topic Operator that will replace the current Topic Operator and remove the bidirectional synchronization. So that is eventually going to fix this.

scholzj avatar Sep 18 '23 19:09 scholzj

Thanks! I'll see if we can try out the UTO for now!

joshsouza avatar Sep 18 '23 19:09 joshsouza

Triaged in the community call on 21.9.2023: @tombentley will have some additional look at this issue.

scholzj avatar Sep 21 '23 08:09 scholzj

Certainly the operators are not intended to be deployed such that two distinct operators are trying to reconcile the same resource. (This is true for all Strimzi operators).

But given the possibility for data loss as a result of this error, it's reasonable to ask whether we could make things a bit safer.

One idea, which was in the UTO proposal for a while, but was eventually removed, was to include a status.clusterId. This would be checked by an operator before it started any operation on the Kafka cluster.

  • If it was null then the CR status would be updated with the non-null cluster id and the reconciliation would proceed at normal.
  • If it non-null and matched the cluster id of the cluster the operator was connected to the operator would proceed as normal with the reconciliation.
  • If it was non-null and differed the operator would log an error and simply ignore that resource for the rest of the reconciliation.

This isn't completely bullet-proof. e.g. The Kafka cluster could be silently replaced with a different one without the Admin client saying a word to the application. But it would be much safer than currently. This same idea could also be applied to the user operator, for example.

A similar kind of protection could also be achieved using a status.topicId field.

tombentley avatar Sep 25 '23 03:09 tombentley

Triaged on community call on 5.10.2023: Should be postponed till next call as it is not really clear what the previous comment actually suggests. CC @tombentley

scholzj avatar Oct 05 '23 16:10 scholzj

Discussed on the Community call on 19.10.2023: There are no plans to fix this in the Bidirectional Topic Operator as it will be removed in the future. But this can be tracked as an enhancement for the Unidirectional Topics Operator. A proposal should be written to clarify in more detail how the cluster and topics IDs might be used for a protection like this.

scholzj avatar Oct 19 '23 08:10 scholzj

The original reproducer for this issue is now obsolete due to the new Topic Operator unidirectional reconciliation. None of the 3 issues mentioned there happens today. That said, the point about using the .status.topicId as an additional safeguard is valid and should be implemented.

fvaleri avatar May 19 '24 15:05 fvaleri