helix
helix copied to clipboard
Request for Fair State Transition Scheduling Across Multiple Resources
Is your feature request related to a problem? Please describe. We use CUSTOMIZED mode and today there does not seem to be any fairness in terms of how the Helix state transition messages are sent out to the cluster across different resources.
For example, say I have two resources, TableA and TableB that share the same instances. If I update TableA with a very large number of IdealState changes (say 10K), then updates to TableB land up getting queued behind TableA and no progress is made until TableA's state transitions complete.
This leads to starvation for TableB. Ideally resources should not affect each other in a Helix cluster.
Describe the solution you'd like Ideally the Helix state transitions should be scheduled fairly across resources in a cluster to prevent the above starvation scenario.
Additional context I know WAGED solves some of these problems, but moving to WAGED is not an option for us. Are ther any other solutions that can be leveraged?
cc @junkaixue @zpinto
Helix support a fine grained level of message throttling that allow user to config state transition message limit per cluster/instance/resource/partition.
It is in cluster config
"STATE_TRANSITION_THROTTLE_CONFIGS": [
"{\"THROTTLE_SCOPE\":\"RESOURCE\",\"MAX_PARTITION_IN_TRANSITION\":\"20\",\"REBALANCE_TYPE\":\"LOAD_BALANCE\"}"
],
If we set resource level state transition throttle, all resources get at ma 20 state transitions and TableB wont starve.
got it, thanks @xyuanlu I'll look into this
what are the different REBALANCE_TYPE values that are relevant for CUSTOMIZED mode?
REBALANCE_TYPE meaning it is recovery rebalance or load balance recovery rebalabce -> currently have less than min active replica, or missing top state load rebalance -> meet the requirement of min replica, just need to converge the current state to ideal state.
@somandal May I ask does setting the throttling config work? :D
@xyuanlu we haven't had a chance to try this out yet. Once we get a chance I'll let you know. thanks!
@xyuanlu In our use case MESSAGE_CONSTRAINT is set to be throttled to certain value and if STATE_TRANSITION_THROTTLE_CONFIGS are also being set, does the STATE_TRANSITION_THROTTLE_CONFIGS not kick in or will it be overriden by the STATE_TRANSITION constraint value being set?
"STATE_TRANSITION_THROTTLE_CONFIGS": [
"{\"THROTTLE_SCOPE\":\"RESOURCE\",\"MAX_PARTITION_IN_TRANSITION\":\"20\",\"REBALANCE_TYPE\":\"LOAD_BALANCE\"}"
],
And there are other rebalance types like ANY and NONE. When should I be using them?
And there are other rebalance types like ANY and NONE. When should I be using them?
-> ANY included both load rebalance and recovery rebalance. NONE is an error value should not be used.
If both MESSAGE_CONSTRAINT and STATE_TRANSITION_THROTTLE_CONFIGS are set, they are both honored.
So, this is how the _pendingMessagesMap is looking like on the Helix when throttling settings are enabled:
Here instead of giving only one transition message for this Resource fake_table18_REALTIME, it is giving out 10 messages when following throttling is set, I am unsure if I need to tweek some other configs to achieve only max 1 transition message to be allowed for this resource because the StateTransitionThrottleController seems to be taking right configs as well:
{"THROTTLE_SCOPE":"RESOURCE","MAX_PARTITION_IN_TRANSITION":"1","REBALANCE_TYPE":"LOAD_BALANCE"}
Also I have set the MAX STATE_TRANSITION constarint to be 10 which explains 10 transitions at the moment.
Cluster Config ZNRecord:
PinotCluster, {STATE_TRANSITION.maxThreads=10, allowParticipantAutoJoin=true, controller.replication.threshold=3, default.hyperloglog.log2m=8, enable.case.insensitive=true, pinot.beta.multistage.engine.max.server.query.threads=1000, pinot.broker.enable.query.limit.override=false, pinot.forward.index.default.raw.index.writer.version=4, pinot.helix.instance.state.maxStateTransitions=10, zk.client.session.timeout.ms=300000}{}{STATE_TRANSITION_THROTTLE_CONFIGS=[{"THROTTLE_SCOPE":"RESOURCE","MAX_PARTITION_IN_TRANSITION":"1","REBALANCE_TYPE":"ANY"}]}
@xyuanlu I don't think the throttling configs are helping us out. Can you help here?
@deepthi912 is the behavior the same if you set that config for REBALANCE_TYPE: ANY? I'm just wondering if the other state transition messages were added due to the other REBALANCE_TYPE
@somandal Yes I tried out ANY, RECOVERY_REBALANCE and LOAD_BALANCE REBALANCE_TYPE types