azure-sdk-for-java
azure-sdk-for-java copied to clipboard
KafkaV2SinkConnector
Feature request https://github.com/Azure/azure-sdk-for-java/issues/38769
In this PR, we added the kafka CosmosDB Sink Connector V2 version.
Config
-
kafka.connect.cosmos.accountEndpoint
-> No default. Cosmos DB Account Endpoint Uri -
kafka.connect.cosmos.accountKey
-> No default. Cosmos DB Account Key -
kafka.connect.cosmos.useGatewayMode
-> Default false. Flag to indicate whether to use gateway mode. By default it is false. -
kafka.connect.cosmos.preferredRegionsList
-> Default empty list. Preferred regions list to be used for a multi region Cosmos DB account. This is a comma separated value (e.g.,[East US, West US]
orEast US, West US
) provided preferred regions will be used as hint. You should use a collocated kafka cluster with your Cosmos DB account and pass the kafka cluster region as preferred region. See list of azure regions here -
kafka.connect.cosmos.applicationName
-> Default empty string. Will be added as the userAgent suffix. -
kafka.connect.cosmos.sink.database.name
-> No Default. CosmosDb database name. -
kafka.connect.cosmos.sink.containers.topicMap
-> No Default. A comma delimited list of Kafka topics mapped to Cosmos containers. For example: topic1#con1,topic2#con2. -
kafka.connect.cosmos.sink.errors.tolerance
-> DefaultNone
. Error tolerance level after exhausting all retries.None
for fail on error.All
for log and continue -
kafka.connect.cosmos.sink.bulk.enabled
-> Defaulttrue
. Flag to indicate whether Cosmos DB bulk mode is enabled for Sink connector. -
kafka.connect.cosmos.sink.bulk.maxConcurrentCosmosPartitions
-> Default -1. Usually this is only required to be tuned for large containers. Cosmos DB Item Write Max Concurrent Cosmos Partitions. If not specified it will be determined based on the number of the container's physical partitions which would indicate every Spark partition is expected to have data from all Cosmos physical partitions. If specified it indicates from at most how many Cosmos Physical Partitions each Spark partition contains data. So this config can be used to make bulk processing more efficient when input data in Spark has been repartitioned to balance to how many Cosmos partitions each Spark partition needs to write. This is mainly useful for very large containers (with hundreds of physical partitions. -
kafka.connect.cosmos.sink.bulk.initialBatchSize
-> Default 1. Cosmos DB initial bulk micro batch size - a micro batch will be flushed to the backend when the number of documents enqueued exceeds this size - or the target payload size is met. The micro batch size is getting automatically tuned based on the throttling rate. By default the initial micro batch size is 1. Reduce this when you want to avoid that the first few requests consume too many RUs. -
kafka.connect.cosmos.sink.write.strategy
-> DefaultItemOverwrite
. Cosmos DB Item write Strategy:ItemOverwrite
(using upsert),ItemAppend
(using create, ignore pre-existing items i.e., Conflicts),ItemDelete
(deletes based on id/pk of data frame),ItemDeleteIfNotModified
(deletes based on id/pk of data frame if etag hasn't changed since collecting id/pk),ItemOverwriteIfNotModified
(using create if etag is empty, update/replace with etag pre-condition otherwise, if document was updated the pre-condition failure is ignored) -
kafka.connect.cosmos.sink.maxRetryCount
-> Default 10. Cosmos DB max retry attempts on write failures for Sink connector. By default, the connector will retry on transient write errors for up to 10 times. -
kafka.connect.cosmos.sink.id.strategy
-> DefaultProvidedInValueStrategy
. A strategy used to populate the document with anid
. Valid strategies are:TemplateStrategy
,FullKeyStrategy
,KafkaMetadataStrategy
,ProvidedInKeyStrategy
,ProvidedInValueStrategy
. Configuration properties prefixed withid.strategy
are passed through to the strategy. For example, when usingid.strategy=TemplateStrategy
, the propertyid.strategy.template
is passed through to the template strategy and used to specify the template string to be used in constructing theid
.
API change check
APIView has identified API level changes in this PR and created following API reviews.
/azp run java - cosmos - tests
Azure Pipelines successfully started running 1 pipeline(s).
/azp run java - cosmos - tests
Azure Pipelines successfully started running 1 pipeline(s).
/azp run java - cosmos - tests
Azure Pipelines successfully started running 1 pipeline(s).
/azp run java - cosmos - tests
Azure Pipelines successfully started running 1 pipeline(s).