kafka_exporter icon indicating copy to clipboard operation
kafka_exporter copied to clipboard

Auto creating kafka topics after remove - v1.3.0 and v1.3.1.

Open patdoman opened this issue 4 years ago • 2 comments

Hi, I noticed that after remove topic is recreating. Checked on v1.3.0 and v1.3.1. v1.2.0 doesn't have this problem. Temporary solution is to stop kafka-exporter pod and then remove given topic. kafka version: 2.7.1

message [KafkaApi-0] Auto creation of topic ams-azure.x-test with 1 partitions and replication factor 3 is successful

No logs related to topic on kafka exporter side

patdoman avatar Aug 25 '21 15:08 patdoman

I cannot reproduce this in v1.3.2. Please tell me how to reproduce, thanks. image

danielqsj avatar Sep 08 '21 04:09 danielqsj

Hi @danielqsj, I've run into this problem and was able to reproduce it like this:

First, I'm showing that no other topics exist:

$ ./bin/kafka-topics.sh --bootstrap-server localhost:9092 --list
$

Then I'm creating a single topic on the server:

$ ./bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic my-test-topic --replication-factor=1 --partitions=2
Created topic my-test-topic.
$ ./bin/kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic my-test-topic
Topic: my-test-topic	PartitionCount: 2	ReplicationFactor: 1	Configs: 
	Topic: my-test-topic	Partition: 0	Leader: 1	Replicas: 1	Isr: 1
	Topic: my-test-topic	Partition: 1	Leader: 1	Replicas: 1	Isr: 1
$

As you can see, this topic has been created with 2 partitions.

Then I'm starting kafka_exporter:

$ ./kafka_exporter --kafka.server=localhost:9092 --verbosity=1 --offset.show-all --concurrent.enable
I0202 14:56:25.701859 1293261 kafka_exporter.go:778] Starting kafka_exporter (version=, branch=, revision=)
I0202 14:56:25.701970 1293261 kafka_exporter.go:779] Build context (go=go1.16.3, user=, date=)
I0202 14:56:25.705241 1293261 kafka_exporter.go:938] Listening on HTTP :9308
...

In another terminal, I'm starting a tight loop of curl executions that will frequently poll /metrics:

$ while true ; do curl -s http://localhost:9308/metrics > /dev/null ; sleep 1 ; done

I'm now deleting the topic I created earlier, wait a few seconds, then describe the topic:

$ ./bin/kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic=my-test-topic ; sleep 3 ; ./bin/kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic my-test-topic
Topic: my-test-topic	PartitionCount: 1	ReplicationFactor: 1	Configs: 
	Topic: my-test-topic	Partition: 0	Leader: 1	Replicas: 1	Isr: 1
$

What you can see here is that suddenly, the topic has only a single partition. We use an internal tool to more closely monitor kafka which actually shows that the topic gets deleted and then recreated, but I haven't been able to show this with the normal kafka-topics.sh which seems to be a bit slow.

When looking at the kafka_exporter logs, we also see that from the exporter's point of view, the topic didn't exist at some point:

I0202 15:19:06.807099 1323758 kafka_exporter.go:645] Fetching consumer group metrics
I0202 15:19:07.824354 1323758 kafka_exporter.go:645] Fetching consumer group metrics
I0202 15:19:08.837300 1323758 kafka_exporter.go:645] Fetching consumer group metrics
E0202 15:19:10.138413 1323758 kafka_exporter.go:408] Cannot get leader of topic my-test-topic partition 1: kafka server: Request was for a topic or partition that does not exist on this broker.
E0202 15:19:10.142092 1323758 kafka_exporter.go:417] Cannot get current offset of topic my-test-topic partition 1: kafka server: Request was for a topic or partition that does not exist on this broker.
E0202 15:19:10.146084 1323758 kafka_exporter.go:429] Cannot get oldest offset of topic my-test-topic partition 1: kafka server: Request was for a topic or partition that does not exist on this broker.
E0202 15:19:10.147423 1323758 kafka_exporter.go:438] Cannot get replicas of topic my-test-topic partition 1: kafka server: Request was for a topic or partition that does not exist on this broker.
E0202 15:19:10.148769 1323758 kafka_exporter.go:447] Cannot get in-sync replicas of topic my-test-topic partition 1: kafka server: Request was for a topic or partition that does not exist on this broker.
I0202 15:19:10.148879 1323758 kafka_exporter.go:645] Fetching consumer group metrics
I0202 15:19:11.192781 1323758 kafka_exporter.go:645] Fetching consumer group metrics
I0202 15:19:12.233610 1323758 kafka_exporter.go:645] Fetching consumer group metrics
I0202 15:19:13.260190 1323758 kafka_exporter.go:645] Fetching consumer group metrics

In the kafka logs, this looks like this (the timestamps in the kafka log are UTC, while in the kafka_exporter log they're CET):

2-02-02 14:19:09,423] INFO [ReplicaFetcherManager on broker 1] Removed fetcher for partitions my-test-topic-0,my-test-topic-1 (kafka.server.ReplicaFetcherManager)
[2022-02-02 14:19:09,423] INFO [ReplicaFetcherManager on broker 1] Removed fetcher for partitions  (kafka.server.ReplicaFetcherManager)
[2022-02-02 14:19:09,424] INFO [ReplicaFetcherManager on broker 1] Removed fetcher for partitions my-test-topic-1 (kafka.server.ReplicaFetcherManager)
[2022-02-02 14:19:09,430] INFO Log for partition my-test-topic-1 is renamed to /kafka/kafka-logs-dff5fd8d7b17/my-test-topic-1.12bc58943d004e44b8a41322a20693c2-delete and is scheduled for deletion (kafka.log.LogManager)
[2022-02-02 14:19:09,431] INFO [ReplicaFetcherManager on broker 1] Removed fetcher for partitions my-test-topic-0 (kafka.server.ReplicaFetcherManager)
[2022-02-02 14:19:09,435] INFO Log for partition my-test-topic-0 is renamed to /kafka/kafka-logs-dff5fd8d7b17/my-test-topic-0.3a7aced711714e2fa7d4a1716bd2568a-delete and is scheduled for deletion (kafka.log.LogManager)
[2022-02-02 14:19:09,881] INFO Topic creation {"version":1,"partitions":{"0":[1]}} (kafka.admin.AdminUtils$)
[2022-02-02 14:19:09,882] INFO [KafkaApi-1] Auto creation of topic my-test-topic with 1 partitions and replication factor 1 is successful (kafka.server.KafkaApis)
[2022-02-02 14:19:09,891] INFO [ReplicaFetcherManager on broker 1] Removed fetcher for partitions my-test-topic-0 (kafka.server.ReplicaFetcherManager)
[2022-02-02 14:19:09,892] INFO Loading producer state from offset 0 for partition my-test-topic-0 with message format version 2 (kafka.log.Log)
[2022-02-02 14:19:09,892] INFO Completed load of log my-test-topic-0 with 1 log segments, log start offset 0 and log end offset 0 in 1 ms (kafka.log.Log)
[2022-02-02 14:19:09,893] INFO Created log for partition [my-test-topic,0] in /kafka/kafka-logs-dff5fd8d7b17 with properties {compression.type -> producer, message.format.version -> 1.0-IV0, file.delete.delay.ms -> 60000, max.message.bytes -> 104857600, min.compaction.lag.ms -> 0, message.timestamp.type -> CreateTime, min.insync.replicas -> 1, segment.jitter.ms -> 0, preallocate -> false, min.cleanable.dirty.ratio -> 0.5, index.interval.bytes -> 4096, unclean.leader.election.enable -> false, retention.bytes -> -1, delete.retention.ms -> 86400000, cleanup.policy -> [delete], flush.ms -> 9223372036854775807, segment.ms -> 604800000, segment.bytes -> 1073741824, retention.ms -> 604800000, message.timestamp.difference.max.ms -> 9223372036854775807, segment.index.bytes -> 10485760, flush.messages -> 9223372036854775807}. (kafka.log.LogManager)

As you can see, kafka deleted partitions 0 and 1 for my-test-topic, and then shortly afterwards, auto creation for my-test-topic is triggered, with 1 partition.

Given this behaviour, it very much looks like kafka_exporter's metric collection seems to trigger topic auto creation.

(N.B.: reproducing the problem like this doesn't always 100% work reliably. If you can't get it to run the first time, try re-running the delete topic + wait + describe topic command a few times)

akrennmair avatar Feb 02 '22 14:02 akrennmair

Hi. Are there any updates? We also face this issue with v1.4.2 which comes together with the strimzi-kafka-operator v.0.28. Is it maybe possible to prevent this using the consumer config allow.auto.create.topics?

Best regards David

davidpiendl avatar Aug 22 '22 17:08 davidpiendl

@davidpiendl turning off topic auto-creation is the only workaround we were able to find. Lots of experimentation and a review or the kafka_exporter source code led us to the conclusion that it is merely triggering a deeper bug in Kafka itself (or maybe that's exactly Kafka's intended behaviour, who knows).

akrennmair avatar Aug 22 '22 18:08 akrennmair

@akrennmair thanks for the quick response. I saw that the used Samara client allows to configure AllowAutoCreateTopics for consumers and metadata refreshs. See https://github.com/Shopify/sarama/blob/main/config.go#L158 . It defaults to true, so consumers and metadata refreshes cause recreation.

I can not find the setting of this property to false in kafka_exporter, which should definitly be done. Did I overlook something?

davidpiendl avatar Aug 23 '22 12:08 davidpiendl

@davidpiendl sorry for not being clearer: we disabled topic auto-creation on the Kafka server side. That way we managed to contain the issue. It required a review of our whole codebase whether anything implicitly depends on topic auto-creation, and after fixing these, we were safe to switch off the flag on the servers.

akrennmair avatar Aug 23 '22 13:08 akrennmair

@akrennmair @davidpiendl Thanks for reporting this issue. I add a new option kafka.allow-auto-topic-creation to control it in https://github.com/danielqsj/kafka_exporter/commit/5cb32d4bfeb48636ef1b5d5ebbb5856c6bc776d4, you can set --no-kafka.allow-auto-topic-creation to have a try. The image is building in https://github.com/danielqsj/kafka_exporter/actions/runs/2912169325, after this workflow finished, you can have try the danielqsj/kafka-exporter:latest Waiting for your response :)

danielqsj avatar Aug 23 '22 14:08 danielqsj

@danielqsj Maybe a stupid question ... but since the new option defaults the true -> what would be the cases when it is useful for the Kafka Exporter to auto.create the topics? I'm not entirely sure I understand when would the auto-creation be useful here. Is it just for backwards compatibility? Or do you see some value in it?

scholzj avatar Aug 23 '22 19:08 scholzj

@danielqsj Maybe a stupid question ... but since the new option defaults the true -> what would be the cases when it is useful for the Kafka Exporter to auto.create the topics? I'm not entirely sure I understand when would the auto-creation be useful here. Is it just for backwards compatibility? Or do you see some value in it?

@scholzj In the previous kafka_exporter, we do not set this option, so the default value true is used in sarama. I wanna to keep the forward compatibility, and set the default value as true here. But I'm open to change the default value.

danielqsj avatar Aug 24 '22 04:08 danielqsj

After much thought, I agree that the default value of false seems better. v1.6.0 has been released, please try it.

danielqsj avatar Aug 24 '22 06:08 danielqsj

Gret, thanks for the new release 👍

scholzj avatar Aug 24 '22 12:08 scholzj

@danielqsj it seems that this has fixed our problem. I can reproduce the error with v1.5.0, but not with 1.6.0 anymore.

davidpiendl avatar Aug 24 '22 14:08 davidpiendl

@davidpiendl cool, thanks for reporting issue again. :)

danielqsj avatar Aug 25 '22 02:08 danielqsj