venice icon indicating copy to clipboard operation
venice copied to clipboard

[server] Supporting 4 consumer pools for current version and isolation for AA WC leader.

Open haoxu07 opened this issue 1 year ago • 1 comments

This PR is mainly for simplifying the logic of consumer pool assignment logic when subscribing one topic partition through PartitionReplicaIngestionContext. Inside KafkaConsumerServiceDelegator, A ConsumerPoolStrategyType is defined to specify consumer pool allocation strategy implementation of ConsumerPoolStrategy.
AggKafkaConsumerService#subscribeConsumerFor will pass PartitionReplicaIngestionContext with VersionRole and WorkloadType information to let CurrentVersionConsumerPoolStrategy pick up the specific consumer pool. Each instance of KakfaConsumerService will be built inside ConsumerPoolStrategy.

CurrentVersionConsumerPoolStrategy will support for 4 different consumer pools with these 4 configs:

  1. SERVER_CONSUMER_POOL_SIZE_FOR_CURRENT_VERSION_AA_WC_LEADER
  2. SERVER_CONSUMER_POOL_SIZE_FOR_NON_CURRENT_VERSION_AA_WC_LEADER
  3. SERVER_CONSUMER_POOL_SIZE_FOR_CURRENT_VERSION_NON_AA_WC_LEADER
  4. SERVER_CONSUMER_POOL_SIZE_FOR_NON_CURRENT_VERSION_NON_AA_WC_LEADER

DefaultConsumerPoolStrategy will support 1 pool per region with original default setting.

AAOrWCLeaderConsumerPoolStrategy will support for 1 more pool with: SERVER_DEDICATED_CONSUMER_POOL_SIZE_FOR_AA_WC_LEADER. Here for config backward-compatibility, we will keep relying on SERVER_DEDICATED_CONSUMER_POOL_FOR_AA_WC_LEADER_ENABLED to turn on this feature, SERVER_CONSUMER_POOL_ALLOCATION_STRATEGY is only useful when SERVER_DEDICATED_CONSUMER_POOL_FOR_AA_WC_LEADER_ENABLED is turned off.

After we validate the CurrentVersionConsumerPoolStrategy, we will try to clean up the configs and rely on SERVER_CONSUMER_POOL_ALLOCATION_STRATEGY to switch strategy.

Besides, this change also enable re-subscritpion based on workload type change.

Summary, imperative, start upper case, don't end with a period

Resolves #XXX

How was this PR tested?

Does this PR introduce any user-facing changes?

  • [ ] No. You can skip the rest of this section.
  • [ ] Yes. Make sure to explain your proposed changes and call out the behavior change.

haoxu07 avatar Aug 05 '24 09:08 haoxu07

My change has a quite number of conflicts with this PR: #1102 I will let you merge first as I might need to add throttler for each consumer pool.

Can we add a check to make sure the version jumping feature will be enabled together with the 4-pool strategy?

Yeah, I also see your change. Definitely I could add check for the version jumping feature will be enabled together with the 4-pool strategy.

haoxu07 avatar Aug 09 '24 17:08 haoxu07

The code change looks good to me. Do we have a test case, which will go through these changes, so that we can observe the consumer pool change?

future version -> current version
current version -> backup version
non-write-compute -> write-compute
write-compute -> non-write-compute

We had unit test for role change triggering re-subscription and, will add an integration test in next PR for verifying these change.

haoxu07 avatar Aug 14 '24 16:08 haoxu07