helix Request for Fair State Transition Scheduling Across Multiple Resources

Is your feature request related to a problem? Please describe. We use CUSTOMIZED mode and today there does not seem to be any fairness in terms of how the Helix state transition messages are sent out to the cluster across different resources.

For example, say I have two resources, TableA and TableB that share the same instances. If I update TableA with a very large number of IdealState changes (say 10K), then updates to TableB land up getting queued behind TableA and no progress is made until TableA's state transitions complete.

This leads to starvation for TableB. Ideally resources should not affect each other in a Helix cluster.

Describe the solution you'd like Ideally the Helix state transitions should be scheduled fairly across resources in a cluster to prevent the above starvation scenario.

Additional context I know WAGED solves some of these problems, but moving to WAGED is not an option for us. Are ther any other solutions that can be leveraged?

cc @junkaixue @zpinto

Apr 15 '25 22:04 somandal

Helix support a fine grained level of message throttling that allow user to config state transition message limit per cluster/instance/resource/partition.

It is in cluster config

    "STATE_TRANSITION_THROTTLE_CONFIGS": [
      "{\"THROTTLE_SCOPE\":\"RESOURCE\",\"MAX_PARTITION_IN_TRANSITION\":\"20\",\"REBALANCE_TYPE\":\"LOAD_BALANCE\"}"
    ],

If we set resource level state transition throttle, all resources get at ma 20 state transitions and TableB wont starve.

Apr 15 '25 22:04 xyuanlu

got it, thanks @xyuanlu I'll look into this what are the different REBALANCE_TYPE values that are relevant for CUSTOMIZED mode?

Apr 15 '25 22:04 somandal

REBALANCE_TYPE meaning it is recovery rebalance or load balance recovery rebalabce -> currently have less than min active replica, or missing top state load rebalance -> meet the requirement of min replica, just need to converge the current state to ideal state.

Apr 15 '25 22:04 xyuanlu

@somandal May I ask does setting the throttling config work? :D

Apr 24 '25 17:04 xyuanlu

@xyuanlu we haven't had a chance to try this out yet. Once we get a chance I'll let you know. thanks!

Apr 24 '25 18:04 somandal

@xyuanlu In our use case MESSAGE_CONSTRAINT is set to be throttled to certain value and if STATE_TRANSITION_THROTTLE_CONFIGS are also being set, does the STATE_TRANSITION_THROTTLE_CONFIGS not kick in or will it be overriden by the STATE_TRANSITION constraint value being set?

   "STATE_TRANSITION_THROTTLE_CONFIGS": [
      "{\"THROTTLE_SCOPE\":\"RESOURCE\",\"MAX_PARTITION_IN_TRANSITION\":\"20\",\"REBALANCE_TYPE\":\"LOAD_BALANCE\"}"
    ],

And there are other rebalance types like ANY and NONE. When should I be using them?

May 20 '25 20:05 deepthi912

And there are other rebalance types like ANY and NONE. When should I be using them?

-> ANY included both load rebalance and recovery rebalance. NONE is an error value should not be used.

May 20 '25 20:05 xyuanlu

If both MESSAGE_CONSTRAINT and STATE_TRANSITION_THROTTLE_CONFIGS are set, they are both honored.

May 20 '25 21:05 xyuanlu

So, this is how the _pendingMessagesMap is looking like on the Helix when throttling settings are enabled:

Here instead of giving only one transition message for this Resource fake_table18_REALTIME, it is giving out 10 messages when following throttling is set, I am unsure if I need to tweek some other configs to achieve only max 1 transition message to be allowed for this resource because the StateTransitionThrottleController seems to be taking right configs as well:

{"THROTTLE_SCOPE":"RESOURCE","MAX_PARTITION_IN_TRANSITION":"1","REBALANCE_TYPE":"LOAD_BALANCE"}

Also I have set the MAX STATE_TRANSITION constarint to be 10 which explains 10 transitions at the moment.

Cluster Config ZNRecord: PinotCluster, {STATE_TRANSITION.maxThreads=10, allowParticipantAutoJoin=true, controller.replication.threshold=3, default.hyperloglog.log2m=8, enable.case.insensitive=true, pinot.beta.multistage.engine.max.server.query.threads=1000, pinot.broker.enable.query.limit.override=false, pinot.forward.index.default.raw.index.writer.version=4, pinot.helix.instance.state.maxStateTransitions=10, zk.client.session.timeout.ms=300000}{}{STATE_TRANSITION_THROTTLE_CONFIGS=[{"THROTTLE_SCOPE":"RESOURCE","MAX_PARTITION_IN_TRANSITION":"1","REBALANCE_TYPE":"ANY"}]}

May 21 '25 20:05 deepthi912

@xyuanlu I don't think the throttling configs are helping us out. Can you help here?

May 21 '25 21:05 deepthi912

@deepthi912 is the behavior the same if you set that config for REBALANCE_TYPE: ANY? I'm just wondering if the other state transition messages were added due to the other REBALANCE_TYPE

May 22 '25 03:05 somandal

@somandal Yes I tried out ANY, RECOVERY_REBALANCE and LOAD_BALANCE REBALANCE_TYPE types

May 22 '25 05:05 deepthi912

helix helix copied to clipboard

Request for Fair State Transition Scheduling Across Multiple Resources

helix
helix copied to clipboard