kafka-connect-http
kafka-connect-http copied to clipboard
Duplicated records with low `http.timer.interval.millis`
Describe the bug
When the http.timer.interval.millis
value is lower than the offset.flush.interval.ms
kafka connect worker setting, the records in topic are duplicated every http.timer.interval.millis
until offset is commited.
To Reproduce
Set http.timer.interval.millis
connector config to 1000 ms.
Use default http.timer.interval.millis
worker setting of 60000 ms.
Expected behavior
Records should appear once in the target topic.
For the poll
, the offset from the last committed record should be used.
Updating the offset
value should be done at the start of each poll
instead of during commit
.
Kafka Connect:
- Version 3.4.0
Plugin:
- Version 0.8.11
Additional context Add any other context about the problem here.
Yeah, I am having trouble with this as well. I would really like to poll more often than ever minute, but every poll within the 60000 ms window returns duplications.
Example from logs:
[2024-01-18 15:32:26,334] INFO Request for offset {key=ReEqHY0BN-He1Zqe92Zd, timestamp=2024-01-18T15:22:51.495794932Z} yields 1/24 new records (com.github.castorm.kafka.connect.http.HttpSourceTask)
[2024-01-18 15:32:36,393] INFO Request for offset {key=ReEqHY0BN-He1Zqe92Zd, timestamp=2024-01-18T15:22:51.495794932Z} yields 1/24 new records (com.github.castorm.kafka.connect.http.HttpSourceTask)
[2024-01-18 15:32:46,457] INFO Request for offset {key=ReEqHY0BN-He1Zqe92Zd, timestamp=2024-01-18T15:22:51.495794932Z} yields 1/24 new records (com.github.castorm.kafka.connect.http.HttpSourceTask)
[2024-01-18 15:32:56,510] INFO Request for offset {key=ReEqHY0BN-He1Zqe92Zd, timestamp=2024-01-18T15:22:51.495794932Z} yields 1/24 new records (com.github.castorm.kafka.connect.http.HttpSourceTask)
[2024-01-18 15:33:05,610] INFO WorkerSourceTask{id=buypass-playercards-connect-0} Either no records were produced by the task since the last offset commit, or every record has been filtered out by a transformation or dropped due to transformation or conversion errors. (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2024-01-18 15:33:06,567] INFO Request for offset {key=tUszHY0BfTtFkKhrnQU-, timestamp=2024-01-18T15:32:18.026652563Z} yields 0/24 new records (com.github.castorm.kafka.connect.http.HttpSourceTask)
[2024-01-18 15:33:16,621] INFO Request for offset {key=tUszHY0BfTtFkKhrnQU-, timestamp=2024-01-18T15:32:18.026652563Z} yields 0/24 new records (com.github.castorm.kafka.connect.http.HttpSourceTask)
Is there any information regarding this issue?