strimzi-kafka-operator icon indicating copy to clipboard operation
strimzi-kafka-operator copied to clipboard

At times newly created KafkaTopic resource comes into ready state only after the periodic reconciliation

Open Ihjaz opened this issue 3 years ago • 7 comments

Describe the bug I'm using topic-operator module only without the cluster operator. I see this intermittently that when the kafkatopic resources are created, it doesn't come into ready state until the next periodic reconciliation is run.

To Reproduce topic-operator-logs.txt

Steps to reproduce the behavior:

  1. Create a kafkatopic resource.
  2. Try to connect to the topic from a kafka client and you will get an "UNKNOWN_TOPIC_OR_PARTITION" error.
  3. Try to connect after 10 to 15 mins and it is able to connect.
  4. Do a describe of the kafkatopic and it shows an interval of few minutes between "Creation Timestamp" and "Last Transition Time".
Name:         oceana.interval.topic
Namespace:    avaya-kafka
Labels:       app.kubernetes.io/managed-by=Helm
              operator.io/kind=topic
Annotations:  meta.helm.sh/release-name: orca
              meta.helm.sh/release-namespace: default
API Version:  kafka.strimzi.io/v1beta1
Kind:         KafkaTopic
Metadata:
  Creation Timestamp:  2021-03-08T15:15:07Z
  Generation:          1
  Resource Version:    1455674
  Self Link:           /apis/kafka.strimzi.io/v1beta1/namespaces/avaya-kafka/kafkatopics/oceana.interval.topic
  UID:                 9a6aee77-be54-448e-a617-9851af9fdbbd
Spec:
  Config:
    retention.ms:  300000
  Partitions:      1
  Replicas:        1
  Topic Name:      oceana.interval.topic
Status:
  Conditions:
    Last Transition Time:  2021-03-08T15:21:01.687956Z
    Status:                True
    Type:                  Ready
  Observed Generation:     1
Events:                    <none>

Expected behavior Kafka client should be able to connect to the topic within few seconds of being created.

Environment (please complete the following information):

  • Strimzi version: 0.21.0
  • Installation method: Helm chart
  • Kubernetes cluster: Kubernetes 1.17.9
  • Infrastructure: Kubernetes on VmWare

YAML files and logs

Attached the log file which shows the added event for the topic "oceana.interval.topic" coming in at "2021-03-08 15:15:07" but the reconciliation and the creation of the topic happening only on "2021-03-08 15:21:01" after the periodic reconciliation kicks in.

Ihjaz avatar Apr 20 '21 08:04 Ihjaz

CC @tombentley @sknot-rh

scholzj avatar Apr 20 '21 08:04 scholzj

Is it possible to provide logs at DEBUG level? If I am reading the logs correctly, you created the oceana.interval.topic topic (it seems to be created in the Kafka too), then deleted it (15:14:51) and created again (15:15:07). After the second creation it is not recreated in the Kafka. @tombentley is 16 seconds enough to delete a kafka topic from broker?

sknot-rh avatar Apr 20 '21 08:04 sknot-rh

Hi,

Looks like this is happening only when the create is happening immediately after delete.

I'm attaching the logs in DEBUG level. The topic 'testtopic' was deleted on '2021-04-20 17:08:38' and then created back on '2021-04-20 17:08:52'. topic-operator.txt

Ihjaz avatar Apr 20 '21 11:04 Ihjaz

Hi,

Any plans to work on this in the near future?

Ihjaz avatar Aug 05 '21 05:08 Ihjaz

I think this kind of race will become a lot more easy to detect once Kafka's support for topic ids matures and topic ids can be accessed via the Admin client. KAFKA-10774 in particular would be beneficial. Hopefully that will be merged for Kafka 3.1, and perhaps we could start conditionally using it then (by conditionally I mean if the broker supported it, and falling back to the current behaviour if not), though it's possible that it's not worth the complication and we'd decide it was simpler to wait until Strimzi dropped support for 3.0.

tombentley avatar Aug 09 '21 09:08 tombentley

Hi, I see that 3.1 is released. Do you have plans to add the conditional approach you mentioned above to get this addressed if Kafka version 3.1 being used?

Thank You!!

dineshudayakumar avatar Apr 05 '22 16:04 dineshudayakumar

Triaged on 21.7.2022: Seems to be still a bug. should be kept opened.

scholzj avatar Jul 21 '22 15:07 scholzj

The Bidirectional Topic Operator (BTO) has been replaced by the new Unidirectional Topic Operator from Strimzi 0.39. There are no plans to fix any outstanding issues in the old BTO and this issue can be closed.

scholzj avatar Jan 04 '24 19:01 scholzj