kafka-docker icon indicating copy to clipboard operation
kafka-docker copied to clipboard

[Bug]kafka auto log clean not working

Open xiddjp opened this issue 2 years ago • 9 comments

Hi folks,

One problem I encountered is that Kafka's log files will continue to grow and will not be cleared automatically.

I used KAFKA_LOG_RETENTION_MS and KAFKA_LOG_RETENTION_BYTES in docker-compose file.

Are there any problems with these docker configs?

kafka1: restart: always image: wurstmeister/kafka:2.13-2.6.0 ports: - 9092:9092 environment: KAFKA_BROKER_ID: 1 KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181 KAFKA_ADVERTISED_HOST_NAME: 10.17.19.210 KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://10.17.19.210:9092 KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092 KAFKA_CREATE_TOPICS: requests:100:1:delete --config=retention.ms=60000 --config=segment.bytes=26214400 --config=retention.bytes=104857600,tb_transport.api.requests:30:1:delete --config=retention.ms=60000 --config=segment.bytes=26214400 --config=retention.bytes=104857600 KAFKA_AUTO_CREATE_TOPICS_ENABLE: 'false' KAFKA_LOG_RETENTION_BYTES: 1073741824 KAFKA_LOG_SEGMENT_BYTES: 268435456 KAFKA_LOG_RETENTION_MS: 300000 # KAFKA_LOG_CLEANER_ENABLE: 'true' KAFKA_LOG_CLEANUP_POLICY: delete kafka2: restart: always image: wurstmeister/kafka:2.13-2.6.0 ports: - 9093:9093 environment: KAFKA_BROKER_ID: 2 KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181 KAFKA_ADVERTISED_HOST_NAME: 10.17.19.210 KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://10.17.19.210:9093 KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9093 KAFKA_AUTO_CREATE_TOPICS_ENABLE: 'false' KAFKA_LOG_RETENTION_BYTES: 1073741824 KAFKA_LOG_CLEANER_ENABLE: 'true' KAFKA_LOG_SEGMENT_BYTES: 268435456 KAFKA_LOG_RETENTION_CHECK_INTERVAL_MS: 5000 #KAFKA_LOG_RETENTION_MS: 10000 KAFKA_LOG_CLEANUP_POLICY: delete

......

xiddjp avatar Mar 10 '22 11:03 xiddjp

nobody meet this issue?

xiddjp avatar Mar 15 '22 02:03 xiddjp

Are you referring to kafka's own log files? (server.log, controller.log, log-cleaner.log, etc..) Or are you referring to the topic logs? (requests-N, tb_transport.api.requests-N)

I ran a test with your configurations, and the messages i produced to the topic "requests" got deleted after 60000ms as expected:

[2022-03-16 13:33:10,738] INFO [ProducerStateManager partition=requests-0] Writing producer snapshot at offset 2 (kafka.log.ProducerStateM
anager)
[2022-03-16 13:33:10,748] INFO [Log partition=requests-0, dir=/kafka/data] Rolled new log segment at offset 2 in 24 ms. (kafka.log.Log)
[2022-03-16 13:33:10,749] INFO [Log partition=requests-0, dir=/kafka/data] Deleting segment LogSegment(baseOffset=1, size=132, lastModifie
dTime=1647437528000, largestRecordTimestamp=Some(1647437529695)) due to retention time 60000ms breach based on the largest record timestam
p in the segment (kafka.log.Log)
[2022-03-16 13:33:10,754] INFO [Log partition=requests-0, dir=/kafka/data] Incremented log start offset to 2 due to segment deletion (kafk
a.log.Log)
[2022-03-16 13:34:10,755] INFO [Log partition=requests-0, dir=/kafka/data] Deleting segment files LogSegment(baseOffset=1, size=132, lastM
odifiedTime=0, largestRecordTimestamp=Some(1647437529695)) (kafka.log.Log)
[2022-03-16 13:34:10,759] INFO Deleted log /kafka/data/requests-0/00000000000000000001.log.deleted. (kafka.log.LogSegment)

TBragi avatar Mar 16 '22 13:03 TBragi

Are you referring to kafka's own log files? (server.log, controller.log, log-cleaner.log, etc..) Or are you referring to the topic logs? (requests-N, tb_transport.api.requests-N)

I ran a test with your configurations, and the messages i produced to the topic "requests" got deleted after 60000ms as expected:

[2022-03-16 13:33:10,738] INFO [ProducerStateManager partition=requests-0] Writing producer snapshot at offset 2 (kafka.log.ProducerStateM
anager)
[2022-03-16 13:33:10,748] INFO [Log partition=requests-0, dir=/kafka/data] Rolled new log segment at offset 2 in 24 ms. (kafka.log.Log)
[2022-03-16 13:33:10,749] INFO [Log partition=requests-0, dir=/kafka/data] Deleting segment LogSegment(baseOffset=1, size=132, lastModifie
dTime=1647437528000, largestRecordTimestamp=Some(1647437529695)) due to retention time 60000ms breach based on the largest record timestam
p in the segment (kafka.log.Log)
[2022-03-16 13:33:10,754] INFO [Log partition=requests-0, dir=/kafka/data] Incremented log start offset to 2 due to segment deletion (kafk
a.log.Log)
[2022-03-16 13:34:10,755] INFO [Log partition=requests-0, dir=/kafka/data] Deleting segment files LogSegment(baseOffset=1, size=132, lastM
odifiedTime=0, largestRecordTimestamp=Some(1647437529695)) (kafka.log.Log)
[2022-03-16 13:34:10,759] INFO Deleted log /kafka/data/requests-0/00000000000000000001.log.deleted. (kafka.log.LogSegment)

I mean the topic logs, that is the data actually stored. There are a lot of topics in my kafka cluster, not only "request" topic. How to ensure that other topics datas will be cleared?

xiddjp avatar Mar 17 '22 03:03 xiddjp

In your case you create the requests and tb_transport.api.requests topics with specific configurations regarding retention.ms and retention.bytes.

Any other topics will be created with the cluster default settings unless you specify otherwise, and you can use the kafka commands to check these settings on a broker level, or on the specific topic:

https://stackoverflow.com/questions/35997137/how-do-you-get-default-kafka-configs-global-and-per-topic-from-command-line

You could also include a GUI which allows you to easily check the settings of a topic and adjust them if needed, I have had good experience using either Kafdrops or kafka-ui

TBragi avatar Mar 22 '22 07:03 TBragi

In your case you create the requests and tb_transport.api.requests topics with specific configurations regarding retention.ms and retention.bytes.

Any other topics will be created with the cluster default settings unless you specify otherwise, and you can use the kafka commands to check these settings on a broker level, or on the specific topic:

https://stackoverflow.com/questions/35997137/how-do-you-get-default-kafka-configs-global-and-per-topic-from-command-line

You could also include a GUI which allows you to easily check the settings of a topic and adjust them if needed, I have had good experience using either Kafdrops or kafka-ui Got it, thank you.

xiddjp avatar Mar 22 '22 07:03 xiddjp

@xiddjp can this issue be closed? 😃

TBragi avatar Apr 08 '22 06:04 TBragi

In your case you create the requests and tb_transport.api.requests topics with specific configurations regarding retention.ms and retention.bytes.

Any other topics will be created with the cluster default settings unless you specify otherwise, and you can use the kafka commands to check these settings on a broker level, or on the specific topic:

https://stackoverflow.com/questions/35997137/how-do-you-get-default-kafka-configs-global-and-per-topic-from-command-line

You could also include a GUI which allows you to easily check the settings of a topic and adjust them if needed, I have had good experience using either Kafdrops or kafka-ui

how do I create the topics with specific configurations without using the command line?

Tbeck-91 avatar Dec 16 '22 20:12 Tbeck-91

We also encountered the same problem,Expired data on topic cannot be automatically deleted, we kafka version is 2.13-2.5.1. We have attempted to manually clear the data, shorten the retention time of topic, and then restart the broker, it's not work. and then we shorten the log.retention.hours=168 to 48 and manually clear the data, then restart the broker again, this time it takes effect

fanluoo avatar Dec 25 '23 10:12 fanluoo

@xiddjp Have you solved this problem or have any more clues?

fanluoo avatar Dec 25 '23 10:12 fanluoo