kafka-connect-cosmosdb
kafka-connect-cosmosdb copied to clipboard
Batching of updates/inserts
## Problem Statement Writes to Cosmos from the sink are done one by one causing a lot of traffic and RUs to be expended.
## Proposed Solution Batching updates/inserts that come in together. If 50 updates come in in a matter of e.g. 0.5 seconds, they should be batched together. The way this is usually done is by having a window that opens when receiving a message and closes after x amount of time. Everything arriving in that timespan will be part of the same batch update.
## Additional context Other Kafka sinks in the same style as this one has this functionality. JDBC:
See batch.size
here https://docs.confluent.io/kafka-connect-jdbc/current/sink-connector/sink_config_options.html.
Interesting discussion about batching in JDBC: https://github.com/confluentinc/kafka-connect-jdbc/issues/290 Seems to be adjustable to an extent using some other configuration as well:
consumer.fetch.min.bytes=1500000
consumer.fetch.wait.max.ms=1500
consumer.max.poll.records=4000
Next Steps
- [x] Team consensus to proceed
- [ ] Schedule Design Session
- [ ] Complete Design Review