snabblab-nixos
                                
                                 snabblab-nixos copied to clipboard
                                
                                    snabblab-nixos copied to clipboard
                            
                            
                            
                        Workflow for deploying changes to lab server
Here are the scenarios under which we'd like to handle lab servers modifications:
- deploying a change to all lab servers (as an admin)
- deploying a change to all lab servers (via PR)
- deploying a change to subset of lab servers (as temporary change for the time of experiment)
Proposal:
Once we have Hydra setup on supporting server (see #8), it would poll snabblab-nixos for git changes and build the whole machine cluster. If the build is successful, channel would update. Meanwhile all lab servers would pull for latest channel every 15min and upgrade if new channel is available.
Pros
- by building the whole machine we can be sure there are no Nix eval errors
- changes can still be deployed outside the workflow, but next channel update will reset that state
- no need for shared space, git is the place where changes really happen
Cons
- slight delay between the change and deploy (usually shouldn't take more than 30min)
Questions
How the change be tested locally before it's deployed?
By choosing a different NixOps backend it could be deployed into VirtualBox or Qemu via libvirt.
Another option is to apply the change to only one server, test it, then move it to modules/lab-configuration.nix.
cc @lukego
Whatever we decide, we should have a boot test to avoid issues like https://github.com/NixOS/nixpkgs/issues/12949
:+1: sounds good!
I did my research over the weekend (as I'm excited about this as it will reduce my human error logistic in the deployment).
Aszlig implemented this for his cluster and upstreamed the Hydra part. Here are the necessary parts:
A preliminary implementation is now in customchannel branch and on Snabb Hydra.
The prototype plan is to deploy build-{1,2,3,4} machines using this workflow, then gradually switch over one lugano server and then the rest of the lab.
Channel is being generated at https://hydra.snabb.co/eval/674#tabs-new
What's left to do for prototype:
- [x] figure out how hetzner nixops backend generated filesystem units
- [x] refactor deployment code to handle nixops and channels
- [x] test a single machine rebuild & restart
- [x] be able to deploy auto-upgradable channel using nixops
- [ ] merge the two nixops deployments (eiger + lab)
For later on:
- [ ] NixOS test
- [x] channel versioning (nixpkgs+custom)
- [ ] monitor upgrade logs
- [ ] document this workflow upstream