kafka-connect-http icon indicating copy to clipboard operation
kafka-connect-http copied to clipboard

Duplicated records with low `http.timer.interval.millis`

Open tomasz-sadura opened this issue 2 years ago • 1 comments

Describe the bug When the http.timer.interval.millis value is lower than the offset.flush.interval.ms kafka connect worker setting, the records in topic are duplicated every http.timer.interval.millis until offset is commited.

To Reproduce Set http.timer.interval.millis connector config to 1000 ms. Use default http.timer.interval.millis worker setting of 60000 ms.

Expected behavior Records should appear once in the target topic. For the poll, the offset from the last committed record should be used. Updating the offset value should be done at the start of each poll instead of during commit.

Kafka Connect:

  • Version 3.4.0

Plugin:

  • Version 0.8.11

Additional context Add any other context about the problem here.

tomasz-sadura avatar Feb 17 '23 12:02 tomasz-sadura

Yeah, I am having trouble with this as well. I would really like to poll more often than ever minute, but every poll within the 60000 ms window returns duplications.

Example from logs:

[2024-01-18 15:32:26,334] INFO Request for offset {key=ReEqHY0BN-He1Zqe92Zd, timestamp=2024-01-18T15:22:51.495794932Z} yields 1/24 new records (com.github.castorm.kafka.connect.http.HttpSourceTask)
[2024-01-18 15:32:36,393] INFO Request for offset {key=ReEqHY0BN-He1Zqe92Zd, timestamp=2024-01-18T15:22:51.495794932Z} yields 1/24 new records (com.github.castorm.kafka.connect.http.HttpSourceTask)
[2024-01-18 15:32:46,457] INFO Request for offset {key=ReEqHY0BN-He1Zqe92Zd, timestamp=2024-01-18T15:22:51.495794932Z} yields 1/24 new records (com.github.castorm.kafka.connect.http.HttpSourceTask)
[2024-01-18 15:32:56,510] INFO Request for offset {key=ReEqHY0BN-He1Zqe92Zd, timestamp=2024-01-18T15:22:51.495794932Z} yields 1/24 new records (com.github.castorm.kafka.connect.http.HttpSourceTask)
[2024-01-18 15:33:05,610] INFO WorkerSourceTask{id=buypass-playercards-connect-0} Either no records were produced by the task since the last offset commit, or every record has been filtered out by a transformation or dropped due to transformation or conversion errors. (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2024-01-18 15:33:06,567] INFO Request for offset {key=tUszHY0BfTtFkKhrnQU-, timestamp=2024-01-18T15:32:18.026652563Z} yields 0/24 new records (com.github.castorm.kafka.connect.http.HttpSourceTask)
[2024-01-18 15:33:16,621] INFO Request for offset {key=tUszHY0BfTtFkKhrnQU-, timestamp=2024-01-18T15:32:18.026652563Z} yields 0/24 new records (com.github.castorm.kafka.connect.http.HttpSourceTask)

Is there any information regarding this issue?

steam0 avatar Jan 18 '24 15:01 steam0