helix
helix copied to clipboard
Infinite resource balancing issue
Describe the bug
I have 5 resources in my cluster. Each node in my cluster act both as participant and controller (only one gets elected as controller). I am running a leader follower state model - 1 leader and 1 follower. As long as I am running 1 or 2 nodes, my cluster gets correctly formed and all resources are correctly assigned.
However as soon as I add another node, partition assignment keeps on happening all the time and it never completes.
To Reproduce
Steps to reproduce the behaviour.
Expected behaviour
Cluster should become stable after addition of 3rd node
Additional context
Helix version - 1.0.2 Zookeeper version - 3.4.8-1--1 Application Java version - 1.8
Error logs ERROR [2022-01-31 22:43:43,101] [ZkClient-EventThread-219-localhost:2181/apache-helix-clusters] [HelixTaskExecutor]: Message fd355807-0194-4cc9-ac17-f02cee8debbe cannot be processed: fd355807-0194-4cc9-ac17-f02cee8debbe, {CREATE_TIMESTAMP=1643649222888, ClusterEventName=CurrentStateChange, FROM_STATE=LEADER, MSG_ID=fd355807-0194-4cc9-ac17-f02cee8debbe, MSG_STATE=new, MSG_TYPE=STATE_TRANSITION, PARTITION_NAME=TE2201281333044107151256_12, RESOURCE_NAME=TE2201281333044107151256, RETRY_COUNT=3, SRC_NAME=e51087cc-2713-4ff2-bcba-5dbffc0f8638, SRC_SESSION_ID=579a797550e896c, STATE_MODEL_DEF=MatchmakerLeaderStandBy, STATE_MODEL_FACTORY_NAME=DEFAULT, TGT_NAME=82bd5be6-47ba-467d-a53e-f4c6e77d1f0d, TGT_SESSION_ID=579a797550e8970, TO_STATE=STANDBY}{}{}Partition TE2201281333044107151256_12 current state is same as toState (LEADER->STANDBY) from message.
Screenshot of UI
Please let me know if any other additional info is required
@r0goyal Are you using default assignment algorithm? If yes, then that's an known issue we found as flipflop assignment. Suggest you to use CRUSHED based assignment. Or if you are in 1.0+, you can try WAGED.