kafka-connect-cosmosdb icon indicating copy to clipboard operation
kafka-connect-cosmosdb copied to clipboard

Batching of updates/inserts

Open kahole opened this issue 3 years ago • 4 comments

## Problem Statement Writes to Cosmos from the sink are done one by one causing a lot of traffic and RUs to be expended.

## Proposed Solution Batching updates/inserts that come in together. If 50 updates come in in a matter of e.g. 0.5 seconds, they should be batched together. The way this is usually done is by having a window that opens when receiving a message and closes after x amount of time. Everything arriving in that timespan will be part of the same batch update.

## Additional context Other Kafka sinks in the same style as this one has this functionality. JDBC:

See batch.size here https://docs.confluent.io/kafka-connect-jdbc/current/sink-connector/sink_config_options.html.

Interesting discussion about batching in JDBC: https://github.com/confluentinc/kafka-connect-jdbc/issues/290 Seems to be adjustable to an extent using some other configuration as well:

consumer.fetch.min.bytes=1500000
consumer.fetch.wait.max.ms=1500
consumer.max.poll.records=4000

Next Steps

  • [x] Team consensus to proceed
  • [ ] Schedule Design Session
  • [ ] Complete Design Review

kahole avatar Jun 11 '21 08:06 kahole