simgrid Wrong energy consumption with multi-core VM

When at least one task is executed within a multi-core VM, at the energy consumption level it is as if the VM systematically uses all its cores even if less cores are actually used for computation.

Attached is a MWE that show this issue:

We consider a 16 cores PM that has a speed of 1 f/sec and consumes 0 W when idle, 1 W when using 1 core, and 16 W when using all of its 16 cores.
We deploy a 8 cores VM on the PM.
We run a single task of 10 f within the VM.
The task ends after 10 sec, which is the expected behavior.
However, during that time the PM consumed 80 J instead of 10 J --i.e. as if all the 8 cores of the VM were used.

multi-core-vm.zip

Sep 11 '18 09:09 bcamus

Hello, thanks for the report.

After discussing with @alegrand, we have the feeling that the code computing the sharing of both LMM is simply too complex to be reliable. In addition, the ignore_empty_vm_in_pm_LMM() must be an ugly performance killer that should be removed. Instead, things should certainly be implemented as follows. I'm writting it here because I doubt to find the time to fix it myself any time soon. I don't want to forget in between, and I'd love if someone else does it before me.

Context, existing code

When computing this sharing, there is 3 important notions: needs (what the activities want to consume in a perfect world), availability (what the resources can afford) and sharing (what the activities actually get from the resources).

There several LMMs in any simulation: physical resources (net, disk, cpu, host), virtual (cpu, host). (+the user may define other ones).

Here is the sharing algorithm, that occur at each scheduling round:

ignore_empty_vm_in_pm_LMM() computes the need of the VM by counting the tasks it contains (ignoring the suspending ones).
The sharing is computed on physical resources (considering VMs as execution tasks running on the PMs)
Sharing at the VM level, to share what each VM got from its PM between the tasks it contains.
Sharing in the user models

Proposed roadmap

[x] Add a new field in VirtualMachineImpl, counting the active tasks, initialized to 0
[x] When a VM is created, make sure that the representing execution action_ does not require anything on the PM, since it does not contain any task itself yet. In VirtualMachineImpl::VirtualMachineImpl() constructor, add this:
```
// It's empty for now, so it should not request resources in the PM
action_->get_model()->get_maxmin_system()->update_variable_weight(action_->get_variable(), this->active_tasks_);
```
Resist the lust to improve the OOP of the maxmin module for now :-/
[ ] In VMModel::VMModel(), add callbacks to the signals ExecImpl::on_creation and ExecImpl::on_completion updating active_tasks_ and calling variable->update_variable_weight() (only if exec->host_ is a VM)
[ ] Do something similar in a new callback to the ExecImpl::on_migration
[x] Create signals ExecImpl::on_suspended and ExecImpl::on_resumed, and add VM callbacks to update the active_tasks_
[ ] Make sure that exec->cancel() actually fires the on_completion signal. I think so from reading the code, but it should be actually tested.
[ ] Create a signal ExecImpl::on_bound_change to let the VM react to a set_bound(), and actually do that. The bound is expressed in flops and we need to compute how many cores are actually needed, so we need to sum ratios of consumptions here to compute the VM need (if you have 3 tasks each running at 0.5 of their core speed, you need an active_task_ of 1.5). We may have to react to availability change with the Host::on_speed_change signal
[x] Remove ignore_empty_vm_in_pm_LMM() all together. The variable weight should now be uptodate at each point, so there is no need to recompute it brutally all the time.
[ ] Extend the test teshsuite/msg/cloud-sharing so that each test checks on the amount of dissipated energy, to ensure that this very bug is gone with the code simplification.

Like I said, any help to solve this is really welcome...

Sep 13 '18 06:09 mquinson

For info, @bcamus is working on this on his fork.

Sep 20 '18 08:09 mquinson

Ok, I implemented all the features of the roadmap in my fork (except for exec->cancel() and ExecImpl::on_bound_change). I also made the cloud-sharing test compliant with S4U and moved it to teshsuite/s4u/cloud-sharing.

I still don't have the correct energy consumptions for multi-core VM (you can see that in cloud-sharing). All the other tests succeed. This is very strange because I always have the correct execution times (i.e. the tasks always get the expected CPU time) but not always the correct consumptions (i.e. sometimes, too much cores are used on the host).

For instance :

With the case ( [o]2 )2, we do not get the expected consumption, but we have the expected execution time --i.e when putting a 2-cores VM on a 2-cores PM and adding a single task on the VM, 2 cores of the PM are used according to the energy consumption, but according to its execution time, the tasks get only one core.
With the case ( [o]2 o )2, we get the expected execution time and energy consumption --i.e. when adding a task on the PM, each task get one core of the PM.

It may be interesting to note that, I do not update action_ as specified in the roadmap because it leads to incorrect execution times. Instead, I used the code that was already there -- i.e. I do not use:

action_->get_model()->get_maxmin_system()->update_variable_weight(action_->get_variable(), this->active_tasks_);

but use instead:

int impact = std::min(active_tasks_, get_core_amount());
if (impact > 0)
  action_->set_priority(1. / impact);
else
  action_->set_priority(0.);

Sep 25 '18 15:09 bcamus

Hello @bcamus could you please update our status wrt this bug now that #316 is merged?

Thanks, Mt

Nov 21 '18 10:11 mquinson

Hello,

Here are the remaining tasks to do to close this bug:

[ ] In VMModel::VMModel(), add a callback to the signal ExecImpl::on_migration updating active_tasks_ and calling variable->update_variable_weight() (only if exec->host_ is a VM). ExecImpl::on_migration should be modified so that both the source and the destination are known.
[ ] Make sure that exec->cancel() actually fires the on_completion signal. I think so from reading the code, but it should be actually tested.
[ ] Create a signal ExecImpl::on_bound_change to let the VM react to a set_bound(), and actually do that. The bound is expressed in flops and we need to compute how many cores are actually needed, so we need to sum ratios of consumptions here to compute the VM need (if you have 3 tasks each running at 0.5 of their core speed, you need an active_task_ of 1.5). We may have to react to availability change with the Host::on_speed_change signal

Nov 21 '18 12:11 bcamus

About the remaining stuff:

why not use the on_migration signal of the VM itself?
exec->cancel() does not throw the on_completion signal, but it's one line to add (same as for exec->wait() few lines below.

Nov 21 '18 13:11 frs69wq