xen-orchestra
xen-orchestra copied to clipboard
Load-Balancer: Add a "Balance VCPUs" option in performance plan
Context
- xo-server 5.70.0 xo-web 5.74.0
- Considering pool1 with: HOST1 ( 50% CPU ), HOST2 ( 1% cpu ) withs VM:
- VM1 on HOST1 (1c/2g ) , VM2 ( 4c / 8G ) on HOST2, and VM3. ( 4c / 8G) on HOST2
I configure a Load-Balancing plan in performance mode on the pool. No critical thresholds defined.
Expected behavior
I have 2 BIG VM on the SAME HOST.
One should be move to the other to have
- HOST1: 25% CPU
- HOST2: 25% CPU
Current behavior
No migration....
Found, I have to define a threeshold ....
Why this is required ?
It should be smart and automatically move someting when the cpu is too much used.
I had to define a very threeshold of 50% but i shouldn't have to do that, it should be live automatically calculated no ?
The objective is to live migration to have a perfect balance between host, not a load balancing only when the hyper-visor is ready to die with a high threeshold , i'm right ?
Because currently: i did
performance mode:
threesholds: CPU 50%
BUT IF
HOST1: 35%
HOST2: 0%
Nothing happen.
It's not normal
Third Example
Host1: 90%
Host2: 89%
What gonna happen ? always migrate in loop ?
Can you please give more detail on the ducmentation about this please ?
Best regards
Hi, @Wescoeur is assigned he'll take a look when he can.
Found, I have to define a threeshold .... Why this is required ? It should be smart and automatically move someting when the CPU is too much used.
What's the meaning of "too much"? Personally I don't have the answer: it's subjective. That is why we leave the choice to the user. By default, using the performance mode, the VMs are migrated when the CPU usage is 80% of higher.
I had to define a very threeshold of 50% but i shouldn't have to do that, it should be live automatically calculated no ?
Can you provide a way to compute automatically the threeshold? Or do you have an example?
The objective is to live migration to have a perfect balance between host, not a load balancing only when the hyper-visor is ready to die with a high threeshold , i'm right ?
There are two goals:
- In performance mode, the CPU/RAM must be below a threshold to give the best overall performance.
- In density mode, the objective is to use the least hosts as possible, and to concentrate your VMs. After that you can shutdown unused hosts.
There is no "perfect balance", why start a migration if CPU usage is below 50% in your case? Not all host resources are used yet. If the 50% limit is not reached, it's useless to migrate.
HOST1: 35% HOST2: 0% Nothing happen. It's not normal
As I said earlier it’s normal. But if the limit had been reached with a VM representing 40% of the CPU usage and another 20% (so a total CPU usage of 60%), then the VM with the lower CPU usage would have been migrated.
Host1: 90% Host2: 89% What gonna happen ? always migrate in loop ?
No. In performance mode the migration is never executed if you don't have a profit. For example trying to migrate a VM using 5% of the CPU in this case would create an imbalance, so the vm is therefore not moved. FYI, a VM can be moved to a host "B" from "A" if the CPU usage of "A" remains higher compared to "B" after the migration. This test is performed here before the migration: https://github.com/vatesfr/xen-orchestra/blob/6973b92c4acf771700f365c35aad5e5665745fef/packages/xo-server-load-balancer/src/performance-plan.js#L125-L127
An other idea in this plan is to use the CPU statistics of the last 30min with a ratio to avoid useless migrations if the CPU is used intensely for a short period of time, it's not useful to migrate: https://github.com/vatesfr/xen-orchestra/blob/6973b92c4acf771700f365c35aad5e5665745fef/packages/xo-server-load-balancer/src/plan.js#L157-L167
Official doc: https://xen-orchestra.com/docs/load_balancing.html
Hello,
Thank you very much for your explanation !
VMWare DRS doesn't ask for a "threeshold" we only choose a "middle point" between agressive and passage
And vm are automatically moved accros hosts to have such a "perfect" balance.
Sorry i don't have a way to calcul this ....
Indeed, they are no goal to migrate move if cpu is under 50% however, it's in the case the cpu grow up very very quickly,
The case:
VM1: 20 core, 10G ram VM2: 20 core, 10G ram
Currently, if my threeshold is 50% the vm 2 will be moved if the host cpu is above 50% but taking the case of the Black Friday or TV show, the CPU can grow at "very very fast speed", and then, a host can be under heavy load, and vm will have CPU steal .... whereas if the vm was "migrated before", this problem would never occur.
Was it clear for you ?
Sorry for my bad english :-)
In opposite, can you maybe purpose the reversed density method ? To use all hyper-visor if possible ?
This should answer to my problem maybe :-)
I’m not too familiar with VMWare DRS. :slightly_smiling_face: I suppose several algorithms are used when the "migration threshold" mode is updated and the thresholds are set internally (and/or computed using the CPU models + the number of hosts).
However, we can get closer to it on some points:
- If you want an aggressive mode, you can try to use a lower threshold to force migration.*
- We have a similar "AggressiveCPUActive" option (https://blogs.vmware.com/vsphere/2016/05/load-balancing-vsphere-clusters-with-drs.html) in our load balancer, but actually it's hardcoded... Like I said in my previous answer, we have this code: https://github.com/vatesfr/xen-orchestra/blob/6973b92c4acf771700f365c35aad5e5665745fef/packages/xo-server-load-balancer/src/plan.js#L157-L167
Currently, if my threeshold is 50% the vm 2 will be moved if the host cpu is above 50% but taking the case of the Black Friday or TV show, the CPU can grow at "very very fast speed", and then, a host can be under heavy load, and vm will have CPU steal .... whereas if the vm was "migrated before", this problem would never occur.
It could possibly be a solution to be able to configure the weight (currently 0.75
in the source code above) and the time interval (currently 30
minutes => MINUTES_OF_HISTORICAL_DATA
). With a small interval and a big weight, a VM where a CPU is spiky can be easily migrated.
*I think the load balancer can be improved :wink:, but for the moment I don't see other solutions concerning your problem with the current state of the load balancer. You can try with a low threshold, if it's not sufficient for you, we can probably offer the possibility to modify the weight and time interval.
hello,
Okay thank you.
What do you think about my feature proposal ? The opposite of density mode ?
With this, i could imagine: Host1:
- VM1
- VM2
- VM3
- VM4 Host2:
- NA
1AM --> 6AM : Reverse density mode
VM1 --> Host2
VM2 --> Host2
Every host as now the same number of VM.
Host1:
- VM3
- VM4
Host2:
- VM1
- VM2
Reverse density mode applied.
6AM --> 1AM: Performance Mode
What do you think about this ?
This could answer to my need.
@henri9813 The opposite of density mode can be a good idea but I think we can add an option directly in the performance mode: We can count the number of vCPUs used for each VM on each host and migrate to hosts with the fewest vCPU count when possible. (So the percentage usage is not used with this algorithm, using the vCPU count we are sure to balance correctly the VMs like you want.) Also we must always respect in parallel the thresholds using the percentage usage.
Is it clear for you? Do you you agree with this proposal?
Hello,
Hmmm, this is very clear, but what you described look like the reverse of the density mode,
Define this in performance can be disturbing, i think it should have a dedicated plan: "Distributed" ?
But your proposition is quite amazing ! :D
Best regards
To me the only difference with "perf" is that avoid to migrate VMs because of performance counters before VMs are loaded. It's spreading to get the best vCPU/CPU ratio on all hosts. It's not incompatible with it, more "complementary" (so you prepare to spread your VMs not on their current perfs but on their potential (vCPU number and host).
Also, this spread won't be enough alone, that's why it's only useful to spread VMs when they are under the "load balancing" limit. When there's load, the spread will be less relevant because perf counters are really what matters in the end.
It's like "prepositioning" if you prefer.
Hello,
I like the name "prepositionning", so, will it be a dedicated mode ? not juste a "performance" sub-mode ?
Best regards :-)
As I said, it doesn't make sense alone: just "prepositioning" the VM on theoretical counters (like vCPU number) doesn't make sense as soon you got real load. I mean, what if the VM with less vCPUs on a host is in fact doing all the work? (while others are idle).
So this feature only makes sense inside a mode based on counters. Otherwise, you'll have a placement that won't reflect the real requirements.
Hello,
After re-reading well i agree with you ( my english is limited and i had badly understood your response ).
Best regards :-)
No problem :)
@Wescoeur feel free to create an issue or rename this one, whatever you feel best.
Hi @henri9813
Please can you send me any tutorial how to configure the balancing in my HOSTS ?
I have two hosts under the same XO and I have a VM that use 90% of HOST1 resources, and I want to use the resources of HOST 2 ( CPU & RAM ) also for this VM. is that possible?
No, it's not possible. A single VM can't use resources of 2 hosts.
Hello,
Do you have some news about planification ?
Best regards,
Pinging @Wescoeur about this
Hello @henri9813, we have many important tasks to do before that (XCP-ng maintenance, DRBD/Linstor driver improvements, ...). Because is not a complex problem, I think I could probably look at your problem in detail in a few weeks :wink: .
Don’t hesitate to ping me if I don’t give news!
@Wescoeur can brief you @b-Nollet so it's easier to get the context and how to move forward on this related milestone :)
Done in #7333