autoscaling icon indicating copy to clipboard operation
autoscaling copied to clipboard

move qmp handling to neonvm-runner

Open sharnoff opened this issue 7 months ago • 4 comments

Problem description / Motivation

As discussed here.

There's a few reasons for this:

  1. neonvm-controller sleeps during reconcile while waiting for QEMU; we'd like to avoid sleeps there
  2. Using QEMU events subscription would be hard (but not impossible) in neonvm-controller because of the execution & data model, but we can make it easy for neonvm-runner. See also: #327
  3. Exposing QMP port cluster-wide is a potential security hole. See also: #414

Feature idea(s) / DoD

QMP is inaccessible outside the runner pod; neonvm-runner is exclusively responsible for making the CPU/memory changes that the controller requests (+ starting migration?).

Implementation ideas

Some considerations to be made w.r.t. #738, if we end up merging that PR. But general idea should be to expose some http server that handles:

  1. Returning current CPU/memory
  2. Changing CPU/memory to desired values

This requires bumping the "runner version" to handle that.

We'd probably also end up getting rid of the QMP port from the VM spec — special care needs to be taken to gradually phase that out (+ make sure cplane doesn't set that for new VMs).

sharnoff avatar Jan 16 '24 17:01 sharnoff