garm icon indicating copy to clipboard operation
garm copied to clipboard

feat: recreate idle runners on pool image update

Open kaspar030 opened this issue 8 months ago • 5 comments

IIUC, if a pool image is updated (garm-cli pool update $POOL_ID --image=${TARGET_IMAGE}), already running runners are not recreated, at least if the image has the same name. It would be great if there'd be an option for this.

kaspar030 avatar Apr 24 '25 18:04 kaspar030

Hi @kaspar030 !

Yes. That was by design to avoid potentially rotating pools that have many idle runners set. But you're right. A quick way to do this would be nice.

gabriel-samfira avatar Apr 24 '25 19:04 gabriel-samfira

I've been thinking about this request. There are currently at least 2 pool options that will change the specs of runners in a pool and a potentially 3rd one (depending on provider):

  • Flavor
  • Image
  • Extra specs (depending on provider)

Changing any one of these options, will not affect existing idle runners. Keeping existing runners as they are was done to avoid situations in which you have a large number of idle runners, someone changes one of these options, and all idle runners get recreated.

There are use cases where users create pools with min-idle-runners and max-runners set to the same value. This is feasible on providers like k8s or lxd/incus, where the containers themselves consume very little and you get a large number of ready to use runners.

Right now, if you want to rotate existing idle runners, you need to manually remove each one individually. This is also by design to avoid accidentally removing all runners from a pool. When removing a runner, two things happen:

  • The runner is removed from github. This is also a validation test, because if a runner picks up a job while we are trying to remove it, github will return a bad request status code. We don't want to remove the VM/container while it's running a jobm because that would cancel the job.
  • If we remove the runner from github successfully, we mark it as pending_delete and the runner is removed by GARM

Considering that after removing an idle runner, garm will automatically replace it, we want to avoid situations in which we update the image, then we decide we also want to update the flavor or extra specs, we change our minds and set a new image again, all the while, GARM rotates all runners in a pool for each update we make.

So here is what I propose. We can add a new sub command to pools, which will explicitly request that the pool rotate all runners. That way, once all changes are done, we use that command as a confirmation step that all updates to the pool are final and it's okay to replace all the runners.

Does that sound acceptable?

gabriel-samfira avatar May 09 '25 20:05 gabriel-samfira

Does that sound acceptable?

That's essentially what I'm doing manually now, and does sound acceptable. Would it be possible to rotate only those runners that have changed any options?

kaspar030 avatar May 09 '25 20:05 kaspar030

Yes. We can add a new column to the Instance table where we can save some arbitrary data, and include creation specs (saves us from having to query the provider to get the specs of the VMs) and anything else we need. We can compare those with what the pool currently has and can decide which runners to rotate. We can have something like:

# Rotate all idle runners
garm-cli pool rotate-runners POOL_ID

# Rotate only outdated idle runners
garm-cli pool rotate-runners --outdated-only POOL_ID

This might take a while to get to, however.

gabriel-samfira avatar May 09 '25 20:05 gabriel-samfira

Great, thanks already for thinking this through!

kaspar030 avatar May 09 '25 20:05 kaspar030