colmena icon indicating copy to clipboard operation
colmena copied to clipboard

Evaluate & Build on target machine

Open Mic92 opened this issue 2 years ago • 10 comments

I have a bigger number of machines and evaluating all machines on my laptop is too slow. I also don't want to download all the packages that my servers need just to upload them again from my slow local connection. Is there a way to do evaluation on the target host?

Mic92 avatar Jul 16 '21 08:07 Mic92

This is what I am using right now instead: https://github.com/Mic92/doctor-cluster-config/blob/master/fabfile.py

Mic92 avatar Jul 16 '21 08:07 Mic92

I know that some machines are not powerful enough to evaluate locally. However in this case it is still faster to login to a faster machine and run nixos-rebuild there:

$ ssh strong-machine nixos-rebuild switch --flake ${targetPath}/dotfiles#weakmachine --build-host localhost --target-host root@weekmachine

Mic92 avatar Jul 16 '21 08:07 Mic92

I was thinking about remote eval as well. Currently, most parts of the deployment process are already host-agnostic with implementations for "local" and "SSH" hosts abstracted out, and we can hopefully add evaluation to that list as well.

The problem with remote eval is that we need to define a "boundary" for the configurations that will be copied to the remote host. This is simple with Flake URIs, but then only remote flakes (github:, https://, but not path:) will work.

zhaofengli avatar Jul 17 '21 05:07 zhaofengli

I would do the following: Copy all flake inputs with nix copy to the evaluation target and than do rsync on the main flake to a fixed directory. Why rsync? If you do many small changes than it is a lot faster to use rsync than nix copy.

It could sync the main flake to /var/lib/colmena/. This also has the advantage that one can run nixos-rebuild --flake /var/lib/colmena on the host could be used without having to rely on colmena. This is quite useful to rescue a machine that no longer has network access or you even have to repair it with nixos-install

Mic92 avatar Jul 17 '21 05:07 Mic92

my remote build script looks like this:

#!/usr/bin/env bash

set -x

[ $# -ne 1 ] && echo build-remotely BUILDHOST && exit 1

target_configuration=$(hostname)
remote_builder=$1

export NIX_SSHOPTS="-oStrictHostKeyChecking=no"

drv=$(nix --pure-eval eval --raw ".#nixosConfigurations.${target_configuration}.config.system.build.toplevel.drvPath")

nix copy -s --derivation $drv --to ssh://$remote_builder

remote_result=$(ssh $remote_builder nix-store -r $drv)

nix copy --no-check-sigs  --from ssh://$remote_builder $remote_result

in colmena's case, the nix copy destination would obviously be the target host.

I'm just now checking out colmena. If it suits my usecase nicely, I might try to implement remote building support.

htr avatar Jul 22 '21 14:07 htr

@htr

I'm just now checking out colmena. If it suits my usecase nicely, I might try to implement remote building support.

Remote building is already supported by Colmena, which simply uses Nix's native distributed building functionality. You can have a file like:

ssh://builder@host aarch64-linux /path/to/your/ssh.key 16 2 kvm,big-parallel

and then specify it as the meta.machinesFile in your config. Colmena will then pass the contents of the file in --builders to Nix (#21). If you want, you can also set them globally in nix.buildMachines. Nix will do the same thing your script would (copying the derivation as well as the input closure to a remote machine then copying the results back), but with support for multiple builders and basic scheduling.

Remote evaluation, however, is another thing which needs to be investigated.

zhaofengli avatar Jul 23 '21 04:07 zhaofengli

@zhaofengli It would still be great to be able to rebuild on each specific host (instead of one designated builder). Pretty sure none of the other Nix deployment tools can do this either.

bahlo avatar Nov 04 '21 19:11 bahlo

This approach is also interesting. I basically uses your CI to pull store paths: https://determinate.systems/posts/hydra-deployment-source-of-truth This way one does not need to re-evaluate on the target. In case of hydra evaluation happens in parallel.

Mic92 avatar Nov 05 '21 12:11 Mic92

Remote building is now supported in the unstable branch. If enabled, Colmena will copy system profile derivations to the target nodes and initiate the builds there. There is no need to configure designated builders beforehand.

This avoids copying back the build results like the native distributed build feature in Nix, and can hopefully make it easier to use Colmena on bandwidth-constrained machines as well as on macOS which is now a supported platform.

It can be enabled by:

  • Setting deployment.buildOnTarget = true; in the node configuration, or
  • Setting --build-on-target on the command line. This overrides all deployment.buildOnTarget configurations for this run. You can also temporarily disable the feature for all nodes with --no-build-on-target.

zhaofengli avatar Jan 02 '22 00:01 zhaofengli

A possible improvement over the current implementation: make use of a local --eval-store to avoid copying the derivations.

NickCao avatar Jul 29 '22 05:07 NickCao