snabblab-nixos icon indicating copy to clipboard operation
snabblab-nixos copied to clipboard

Workflow for deploying changes to lab server

Open domenkozar opened this issue 9 years ago • 5 comments

Here are the scenarios under which we'd like to handle lab servers modifications:

  • deploying a change to all lab servers (as an admin)
  • deploying a change to all lab servers (via PR)
  • deploying a change to subset of lab servers (as temporary change for the time of experiment)

Proposal:

Once we have Hydra setup on supporting server (see #8), it would poll snabblab-nixos for git changes and build the whole machine cluster. If the build is successful, channel would update. Meanwhile all lab servers would pull for latest channel every 15min and upgrade if new channel is available.

Pros

  • by building the whole machine we can be sure there are no Nix eval errors
  • changes can still be deployed outside the workflow, but next channel update will reset that state
  • no need for shared space, git is the place where changes really happen

Cons

  • slight delay between the change and deploy (usually shouldn't take more than 30min)

Questions

How the change be tested locally before it's deployed?

By choosing a different NixOps backend it could be deployed into VirtualBox or Qemu via libvirt.

Another option is to apply the change to only one server, test it, then move it to modules/lab-configuration.nix.

cc @lukego

domenkozar avatar Feb 17 '16 19:02 domenkozar

Whatever we decide, we should have a boot test to avoid issues like https://github.com/NixOS/nixpkgs/issues/12949

domenkozar avatar Mar 17 '16 14:03 domenkozar

:+1: sounds good!

lukego avatar Mar 20 '16 12:03 lukego

I did my research over the weekend (as I'm excited about this as it will reduce my human error logistic in the deployment).

Aszlig implemented this for his cluster and upstreamed the Hydra part. Here are the necessary parts:

domenkozar avatar Apr 18 '16 09:04 domenkozar

A preliminary implementation is now in customchannel branch and on Snabb Hydra.

domenkozar avatar May 03 '16 23:05 domenkozar

The prototype plan is to deploy build-{1,2,3,4} machines using this workflow, then gradually switch over one lugano server and then the rest of the lab.

Channel is being generated at https://hydra.snabb.co/eval/674#tabs-new

What's left to do for prototype:

  • [x] figure out how hetzner nixops backend generated filesystem units
  • [x] refactor deployment code to handle nixops and channels
  • [x] test a single machine rebuild & restart
  • [x] be able to deploy auto-upgradable channel using nixops
  • [ ] merge the two nixops deployments (eiger + lab)

For later on:

  • [ ] NixOS test
  • [x] channel versioning (nixpkgs+custom)
  • [ ] monitor upgrade logs
  • [ ] document this workflow upstream

domenkozar avatar May 05 '16 19:05 domenkozar