nebula [Bug] leader balance don't work well

Describe the bug (required)

In our cluster, there are 8 hosts, and each host has 54 partitions, as the replica factor is 3, each host should have 18 leaders on average. However, after leader balance, the leader distribution is 15, 18, 18, 18, 18, 19, 19, 19 on different hosts, for example, the hosts is h0, h1, h2, h3, h4, h5, h6, h7. I think the balance result is not good enough, can we try to balance and make each host 18 leaders?

More information is that, the partition peers of h0 is only h1, h2, h3, h4, the 4 hosts have 18 leaders each.

Leader balance code is here, it seems that, when h0 wants to get a leader from h1, h2, h3 or h4, it will be failed, as the condition "minLoad < sourceLeaders.size()" is not met.

So, maybe we need a better strategy for leader balance, for example, we may need to consider more when doing leader balance, instead of only focusing on partition's peers, but the whole cluster.

Your Environments (required)

OS: uname -a
Compiler: g++ --version or clang++ --version
CPU: lscpu
Commit id (e.g. a3ffc7d8)

How To Reproduce(required)

Steps to reproduce the behavior:

Step 1
Step 2
Step 3

Expected behavior

Additional context

Aug 10 '23 01:08 songqing

it seems that, when h0 wants to get a leader from h1, h2, h3 or h4, it will be failed, as the condition "minLoad < sourceLeaders.size()" is not met.

In your example, what is the minLoad of h0, 18?

Aug 31 '23 11:08 critical27

I think the scenario you describe do exists, h0 only has overlaps with h1, h2, h3, h4, but they all have 18 leaders.

But do we really need to make it perfect 18?

Aug 31 '23 11:08 critical27

it seems that, when h0 wants to get a leader from h1, h2, h3 or h4, it will be failed, as the condition "minLoad < sourceLeaders.size()" is not met.

In your example, what is the minLoad of h0, 18?

Yes, minLoad is 18, maxLoad is 19

Sep 01 '23 01:09 songqing

I think the scenario you describe do exists, h0 only has overlaps with h1, h2, h3, h4, but they all have 18 leaders.

But do we really need to make it perfect 18?

When the cluster has high access pressure, for example, the server's CPU usage is nearly full, the client will receive much error as one or more machines have higher pressure, but other machines may still have buffer.

I think if each server's leader is perfect 18, it'll be better, and if it can be done easily, I think there is no harm, so, it's a good thing to do it.

Sep 01 '23 01:09 songqing

@wey-gu

We are observing this imbalance in v3.6.0, below is our cluster info: metad: 3 graphd: 3 storaged: 7 replicaFactor: 3 No of partitions: 140

After several BALANCE LEADER attempts

Expected leader distribution: 20, 20, 20, 20, 20, 20, 20
Actual leader distribution:   26, 26, 27, 15, 14, 17, 15

@songqing you have only 8 hosts; aren't you supposed to have odd number of hosts for Raft?

Sep 26 '23 03:09 porscheme

@wey-gu

We are observing this imbalance in v3.6.0, below is our cluster info: metad: 3 graphd: 3 storaged: 7 replicaFactor: 3 No of partitions: 140

After several BALANCE LEADER attempts
Expected leader distribution: 20, 20, 20, 20, 20, 20, 20
Actual leader distribution:   26, 26, 27, 15, 14, 17, 15
@songqing you have only 8 hosts; are your supposed to have odd number of hosts for Raft?

I think host number has nothing to do with the leader distribution, both odd number and even number are ok. The leader balance algo is the key problem.

Sep 26 '23 03:09 songqing

@songqing you have only 8 hosts; are your supposed to have odd number of hosts for Raft?

I think host number has nothing to do with the leader distribution, both odd number and even number are ok. The leader balance algo is the key problem.

Maybe for distribution, but aren't you supposed to have odd number of hosts?

In any case, this leader imbalance effecting the perf very badly on huge graph. Our space has total Vertices Count: 2.8 Billion total Edges Count: 1 Billon

Sep 26 '23 04:09 porscheme

@songqing you have only 8 hosts; are your supposed to have odd number of hosts for Raft?

I think host number has nothing to do with the leader distribution, both odd number and even number are ok. The leader balance algo is the key problem.

Maybe for distribution, but aren't you supposed to have odd number of hosts?

In any case, this leader imbalance effecting the perf very badly on huge graph. Our space has total Vertices Count: 2.8 Billion total Edges Count: 1 Billon

Metad hosts' number should be odd, storaged's has no this limitation I think

Sep 26 '23 05:09 songqing

Yes, we could have even numbers of storage hosts, the things to be odd should be the replica factor for spaces.

Sep 26 '23 06:09 wey-gu

nebula nebula copied to clipboard

[Bug] leader balance don't work well

nebula
nebula copied to clipboard