kafka-connect-cosmosdb icon indicating copy to clipboard operation
kafka-connect-cosmosdb copied to clipboard

Nice to have worker task:topic mapping in Sink connector

Open jcocchi opened this issue 5 years ago • 7 comments

Currently the sink connector spins up as many workers as the maxTasks configuration and assigns all topics to all workers. Because of the topic -> collection mapping it would be more efficient to assign specific works to a subset of topics instead of all having all workers read from all topics.

Investigate the implications of this and implement the best solution.

This refers to the taskConfigs function of the CosmosDBSinkConnector

jcocchi avatar Jun 28 '19 23:06 jcocchi

does it work as is? i.e. is this a feature we need to have, or a nice to have enhancement that could improve efficiency?

is it "should have" or "nice to have"?

ryancrawcour avatar Jul 01 '19 21:07 ryancrawcour

@ryancrawcour Yep, it does work as is and is a "nice to have."

Currently all workers listen to all topics, but because of the way we are writing records looping through messages for each topic then writing them it would be more efficient to have certain workers dedicated to certain topics so there are fewer loops per processing chunk.

jcocchi avatar Jul 02 '19 21:07 jcocchi

Makes sense. Thanks. Will mark it as a future enhancement.

ryancrawcour avatar Jul 02 '19 21:07 ryancrawcour

Out of interest, how does the Mongo connector do this?

ryancrawcour avatar Jul 02 '19 22:07 ryancrawcour

Follow up: do a spike to see if it's feasible to configure a worker to a specific topic

If so, Are there performance improvements?

brandynbrown avatar Jan 27 '21 21:01 brandynbrown

Spike Investigate what Cassandra connector does when monitoring multiple topics.

brandynbrown avatar Jan 27 '21 21:01 brandynbrown

blocked by #292

brandynbrown avatar Feb 02 '21 15:02 brandynbrown