autoscaling move qmp handling to neonvm-runner

move qmp handling to neonvm-runner

Open sharnoff opened this issue 7 months ago • 4 comments

Problem description / Motivation

As discussed here.

There's a few reasons for this:

neonvm-controller sleeps during reconcile while waiting for QEMU; we'd like to avoid sleeps there
Using QEMU events subscription would be hard (but not impossible) in neonvm-controller because of the execution & data model, but we can make it easy for neonvm-runner. See also: #327
Exposing QMP port cluster-wide is a potential security hole. See also: #414

Feature idea(s) / DoD

QMP is inaccessible outside the runner pod; neonvm-runner is exclusively responsible for making the CPU/memory changes that the controller requests (+ starting migration?).

Implementation ideas

Some considerations to be made w.r.t. #738, if we end up merging that PR. But general idea should be to expose some http server that handles:

Returning current CPU/memory
Changing CPU/memory to desired values

This requires bumping the "runner version" to handle that.

We'd probably also end up getting rid of the QMP port from the VM spec — special care needs to be taken to gradually phase that out (+ make sure cplane doesn't set that for new VMs).

Jan 16 '24 17:01 sharnoff

autoscaling autoscaling copied to clipboard

move qmp handling to neonvm-runner

Problem description / Motivation

Feature idea(s) / DoD

Implementation ideas

autoscaling
autoscaling copied to clipboard