oxia
oxia copied to clipboard
Load balancer behavior on nodes expansion
When switching from 3 -> 6 nodes, the load balancer is considering the new nodes one by one, leading to multiple transfers of shards.
eg: here it detects the new 3 nodes, then moves all shards from oxia-1 -> oxia-3, later it will move some shards to oxia-4 and oxia-5 and ultimately it would rebalance the leaders.
Oct 22 21:26:16.039 [INF] Detected new node component=coordinator server.name=null server.public=oxia-3:6648 server.internal=oxia-3:6649
Oct 22 21:26:16.039 [INF] Started node controller component=node-controller node=oxia-3:6649
Oct 22 21:26:16.039 [INF] Detected new node component=coordinator server.name=null server.public=oxia-4:6648 server.internal=oxia-4:6649
Oct 22 21:26:16.039 [INF] Started node controller component=node-controller node=oxia-4:6649
Oct 22 21:26:16.039 [INF] Detected new node component=coordinator server.name=null server.public=oxia-5:6648 server.internal=oxia-5:6649
Oct 22 21:26:16.039 [INF] Started node controller component=node-controller node=oxia-5:6649
Oct 22 21:26:16.097 [WRN] Failed to check storage node health by watch component=node-controller error.error="rpc error: code = Unavailable desc = name resolver error: produced zero addresses" error.kind=*status.Error error.stack=null node=oxia-4:6649 retry-after=7534.176346
Oct 22 21:26:16.160 [INF] manually trigger balance component=load-balancer
Oct 22 21:26:16.160 [INF] start shard rebalance avg-shard-ratio=0.008333333333333333 component=load-balancer max-node-load-ratio=0.3333333333333333 min-node-load-ratio=0 quarantine-nodes=[]
Oct 22 21:26:16.160 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=0 to=oxia-3:6649
Oct 22 21:26:16.160 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=2 to=oxia-3:6649
Oct 22 21:26:16.160 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=1 to=oxia-3:6649
Oct 22 21:26:16.160 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=30 to=oxia-3:6649
Oct 22 21:26:16.161 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=8 to=oxia-3:6649
Oct 22 21:26:16.161 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=23 to=oxia-3:6649
Oct 22 21:26:16.161 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=32 to=oxia-3:6649
Oct 22 21:26:16.161 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=27 to=oxia-3:6649
Oct 22 21:26:16.161 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=12 to=oxia-3:6649
Oct 22 21:26:16.161 [INF] Applying swap action component=coordinator swap-action.Shard=0 swap-action.From.name=null swap-action.From.public=oxia-1:6648 swap-action.From.internal=oxia-1:6649 swap-action.To.name=null swap-action.To.public=oxia-3:6648 swap-action.To.internal=oxia-3:6649
Oct 22 21:26:16.161 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=16 to=oxia-3:6649
Oct 22 21:26:16.161 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=26 to=oxia-3:6649
Oct 22 21:26:16.161 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=13 to=oxia-3:6649
Oct 22 21:26:16.161 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=29 to=oxia-3:6649
Oct 22 21:26:16.161 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=10 to=oxia-3:6649
Oct 22 21:26:16.161 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=24 to=oxia-3:6649
Oct 22 21:26:16.161 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=39 to=oxia-3:6649
Oct 22 21:26:16.161 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=18 to=oxia-3:6649
Oct 22 21:26:16.161 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=21 to=oxia-3:6649
Oct 22 21:26:16.162 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=34 to=oxia-3:6649
Oct 22 21:26:16.162 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=19 to=oxia-3:6649
Oct 22 21:26:16.162 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=35 to=oxia-3:6649
Oct 22 21:26:16.162 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=14 to=oxia-3:6649
Oct 22 21:26:16.162 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=9 to=oxia-3:6649
Oct 22 21:26:16.162 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=37 to=oxia-3:6649
Oct 22 21:26:16.162 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=17 to=oxia-3:6649
Oct 22 21:26:16.162 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=22 to=oxia-3:6649
Oct 22 21:26:16.162 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=15 to=oxia-3:6649
Oct 22 21:26:16.162 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=33 to=oxia-3:6649
Oct 22 21:26:16.162 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=31 to=oxia-3:6649
Oct 22 21:26:16.162 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=20 to=oxia-3:6649
Oct 22 21:26:16.162 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=25 to=oxia-3:6649
Oct 22 21:26:16.162 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=38 to=oxia-3:6649
Oct 22 21:26:16.162 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=28 to=oxia-3:6649
Oct 22 21:26:16.162 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=36 to=oxia-3:6649
Oct 22 21:26:16.162 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=11 to=oxia-3:6649
Oct 22 21:26:16.162 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=7 to=oxia-3:6649
Oct 22 21:26:16.162 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=6 to=oxia-3:6649
Oct 22 21:26:16.162 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=4 to=oxia-3:6649
Oct 22 21:26:16.162 [INF] propose to swap the shard component=load-balancer from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 shard=3 to=oxia-3:6649
Oct 22 21:26:16.162 [INF] Swapping node component=shard-controller from.name=null from.public=oxia-1:6648 from.internal=oxia-1:6649 namespace=bookkeeper new-ensemble=[{ name=null public=oxia-0:6648 internal=oxia-0:6649 } { name=null public=oxia-2:6648 internal=oxia-2:6649 } { name=null public=oxia-3:6648 internal=oxia-3:6649 }] removed-nodes=[{ name=null public=oxia-1:6648 internal=oxia-1:6649 }] shard=0 to.name=null to.public=oxia-3:6648 to.internal=oxia-3:6649
Oct 22 23:50:13.269 [WRN] Failed to newTerm, retrying later component=shard-controller error.error="rpc error: code = Unknown desc = failed to reopen database: failed to open database at /data/db/ursa-storage/shard-32: lock held by current process" error.kind=*status.Error error.stack=null follower.name=null follower.public=oxia-4:6648 follower.internal=oxia-4:6649 namespace=ursa-storage retry-after=2176.780657 shard=32 term=46
:/ that's unexpected. Let me check it