nixops icon indicating copy to clipboard operation
nixops copied to clipboard

Support building fully remotely

Open wmertens opened this issue 9 years ago • 28 comments

If you're running nixops on your OS X laptop and deploy, it will first download a bunch of things to your laptop and fail at some point because you're on Darwin. Next, you log in on the deployed remote host and add your own ssh key so that nix-build can use the host. Then, you deploy again and it starts uploading what it just downloaded. This is not a nice user story.

It would be great if there was support for a buildhost, which runs the deployment build and then pushes to the other nodes in the network. One or more of the machines in the network are marked as workers and one is marked as nixops master. If the nixops master is defined it will perform all downloads and coordinate builds, and all other machines are populated from it.

Something like that? Or at the very least documenting a workflow like "deploy a VM from this image, do this to make it into a regular nixos machine with nixops and copy your network definitions on there" would be nice.

wmertens avatar Jan 27 '15 22:01 wmertens

:+1: ran into the exact same issue

copumpkin avatar Feb 13 '15 18:02 copumpkin

I have a patch somewhere...

domenkozar avatar Apr 07 '15 09:04 domenkozar

@domenkozar about that patch... send it to me and I'll clean it up? Or is it ready to go?

wmertens avatar Apr 13 '15 12:04 wmertens

In theory this is already supposed to work. From the code:

        # If we're not running on Linux, then perform the build on the
        # target machines.  FIXME: Also enable this if we're on 32-bit
        # and want to deploy to 64-bit.
        if platform.system() != 'Linux' and os.environ.get('NIX_REMOTE') != 'daemon':
            if os.environ.get('NIX_REMOTE_SYSTEMS') == None:
                remote_machines = []
                for m in sorted(selected, key=lambda m: m.index):
                    key_file = m.get_ssh_private_key_file()
                    if not key_file: raise Exception("do not know private SSH key for machine ‘{0}’".format(m.name))
                    # FIXME: Figure out the correct machine type of ‘m’ (it might not be x86_64-linux).
                    remote_machines.append("root@{0} {1} {2} 2 1\n".format(m.get_ssh_name(), 'i686-linux,x86_64-linux', key_file))
                    # Use only a single machine for now (issue #103).
                    break
                remote_machines_file = "{0}/nix.machines".format(self.tempdir)
                with open(remote_machines_file, "w") as f:
                    f.write("".join(remote_machines))
                os.environ['NIX_REMOTE_SYSTEMS'] = remote_machines_file
            else:
                self.logger.log("using predefined remote systems file: {0}".format(os.environ['NIX_REMOTE_SYSTEMS']))

            # FIXME: Use ‘--option use-build-hook true’ instead of setting
            # $NIX_BUILD_HOOK, once Nix supports that.
            os.environ['NIX_BUILD_HOOK'] = os.path.dirname(os.path.realpath(nixops.util.which("nix-build"))) + "/../libexec/nix/build-remote.pl"

            load_dir = "{0}/current-load".format(self.tempdir)
            if not os.path.exists(load_dir): os.makedirs(load_dir, 0700)
            os.environ['NIX_CURRENT_LOAD'] = load_dir

edolstra avatar Apr 13 '15 13:04 edolstra

Yeah the work that needs to be done is to check if platforms match and provide a way to override the default to on or off.

domenkozar avatar Apr 13 '15 13:04 domenkozar

Found it, it was written by @goodwillcoding https://gist.github.com/goodwillcoding/9c536748ba80da2eeebd

domenkozar avatar Apr 13 '15 13:04 domenkozar

Hm, calling uname on every machine seems unnecessary. Presumably NixOps already knows the architecture of the remote machines (e.g. from the nixpkgs.system option).

edolstra avatar Apr 13 '15 14:04 edolstra

@edolstra actually there are two problems and my report above doesn't highlight them properly.

  • Remote building doesn't work out of the box, due to nix-build not getting SSH credentials from nixops
  • When building remotely does work, all packages are still downloaded locally and pushed from local. This is very slow.

wmertens avatar Apr 13 '15 14:04 wmertens

How about designating a push host, from which other systems get populated? Defaulting to localhost but if set to the build host and if SSH behaviour is fixed, that would behave as I'd prefer.

wmertens avatar Apr 13 '15 14:04 wmertens

The problem with a push host is that there is no guarantee that it has connectivity to the other machines (e.g. if you deploy a network with EC2 machines in different regions).

edolstra avatar Apr 15 '15 12:04 edolstra

@edolstra But in that case you simply keep the push host as localhost and then things are as they are now.

I'm now wondering how to make that work. The push host needs to initiate the builds so it needs to run nix-daemon and the .drv files need to make it over there, but other than that it doesn't need anything right?

So deploying from scratch with a remote push host would be:

  • nixops scans config for hosts, brings up servers
  • if remote push host
    • build .drv files with correct platform
    • once push host up, copy .drv files over, run nix-daemon and instantiate

Would that work?

On Wed, Apr 15, 2015 at 2:27 PM Eelco Dolstra [email protected] wrote:

The problem with a push host is that there is no guarantee that it has connectivity to the other machines (e.g. if you deploy a network with EC2 machines in different regions).

— Reply to this email directly or view it on GitHub https://github.com/NixOS/nixops/issues/260#issuecomment-93368113.

wmertens avatar Apr 15 '15 13:04 wmertens

Forgot to add:

  • Once everything built, use the push host to push everything and activate like normally

On Wed, Apr 15, 2015 at 3:12 PM Wout Mertens [email protected] wrote:

@edolstra But in that case you simply keep the push host as localhost and then things are as they are now.

I'm now wondering how to make that work. The push host needs to initiate the builds so it needs to run nix-daemon and the .drv files need to make it over there, but other than that it doesn't need anything right?

So deploying from scratch with a remote push host would be:

  • nixops scans config for hosts, brings up servers
  • if remote push host
    • build .drv files with correct platform
    • once push host up, copy .drv files over, run nix-daemon and instantiate

Would that work?

On Wed, Apr 15, 2015 at 2:27 PM Eelco Dolstra [email protected] wrote:

The problem with a push host is that there is no guarantee that it has connectivity to the other machines (e.g. if you deploy a network with EC2 machines in different regions).

— Reply to this email directly or view it on GitHub https://github.com/NixOS/nixops/issues/260#issuecomment-93368113.

wmertens avatar Apr 15 '15 13:04 wmertens

Just for reference, this also addresses #195

domenkozar avatar Apr 19 '15 22:04 domenkozar

I just downloaded and started playing with nixops, starting with examples from the manual, and hit this issue pretty quickly. I'm also too new to Nix{,OS,ops} to unstick myself even by manually fixing things up as described in this issue description. No fun!

ryanartecona avatar Oct 28 '15 18:10 ryanartecona

:+1:

I'm trying to deploy and it's taken ~8 hours so far.

jezen avatar Jan 16 '16 08:01 jezen

@aszlig can you post here your findings? Just for future reference :)

domenkozar avatar Feb 27 '16 11:02 domenkozar

The problem with NIX_REMOTE_SYSTEMS is that it still needs setup on the deployment machine, because you can't just "insert" build hooks into an already existing Nix daemon. So it only works if you unset NIX_REMOTE but you'd need to have write access to the local store.

Another idea I had was to instantiate the individual machines, copy-closure the .drv to the target machines and run a nix-build over there. The problem however is that the results are still needed on the deployment machine, but we can't copy them back unless we're in trusted-users, so there still is setup required. Also this doesn't properly divide the builds among all the machines in the deployment, so slow target machines could still be the bottleneck.

aszlig avatar Feb 27 '16 15:02 aszlig

I made issue #483 for allowing builds on OS X to work out of the box for everyone.

I will leave this one open to discuss the push-host idea more. @aszlig would the push-host described above fix your comments?

wmertens avatar Aug 03 '16 12:08 wmertens

FWIW I put together something to easily set up a linux remote builder running docker (https://github.com/holidaycheck/nix-remote-builder). Based on the code that @edolstra pasted having a remote builder configured would be a workaround for this I assume?

Obviously not needing the remote builder in the first place would be better I suppose.

gilligan avatar Sep 27 '16 11:09 gilligan

Relevant https://github.com/NixOS/nixops/pull/412

domenkozar avatar Dec 12 '16 11:12 domenkozar

This feature is unfortunately a dealbreaker for me. I love Nix, but I also have to run on macOS. I was surprised to learn that this is not how nixops actually works. IMHO it severely limits the potential userbase for nixops and puts a damper on growth relative to other tools like terraform.

Also seems that this would solve a number of other issues brought up in the past: https://github.com/NixOS/nixops/issues/560, https://github.com/NixOS/nixops/issues/976.

samuela avatar Oct 03 '20 22:10 samuela

This feature is unfortunately a dealbreaker for me. I love Nix, but I also have to run on macOS. I was surprised to learn that this is not how nixops actually works. IMHO it severely limits the potential userbase for nixops and puts a damper on growth relative to other tools like terraform.

Also seems that this would solve a number of other issues brought up in the past: #560, #976.

FWIW, I did end up getting this working just fine with the help of this article:

https://medium.com/@zw3rk/provisioning-a-nixos-server-from-macos-d36055afc4ad

jezen avatar Oct 05 '20 07:10 jezen

I think I still don't get why we need to download everything on the client side. Couldn't we just add an option to push the configuration.nix file on all the servers, and then run a switch from all the servers? That way, most of the time the servers will just download precompiled binaries from the nixos cache so it will be much more efficient to download it from there rather than from the client (who may have a very poor connection). And if one really wants to avoid building stuff several time when having multile servers, then we could setup an optional cache/builder machine. Also, this issue is not only about MacOs, but you have the same troubles when the client and the server runs on different architecture (in my case the client is x86_64 and the server is a rasperry pi aarch server).

tobiasBora avatar Oct 07 '20 11:10 tobiasBora

@tobiasBora It's not as easy as it sounds because the configuration needs to be complete. So in fact the configuration.nix would need to set the environment to what it is on the build host, import the nixops deployment configuration and extract its host configuration, and include all its nix dependencies. Possible, but not trivial.

But indeed, adding your own cache is probably pretty easy and then what you propose would be really nice. The added bonus is being able to run nixos-rebuild on the server and it working.

wmertens avatar Oct 07 '20 12:10 wmertens

@tobiasBora It's not as easy as it sounds because the configuration needs to be complete. So in fact the configuration.nix would need to set the environment to what it is on the build host, import the nixops deployment configuration and extract its host configuration, and include all its nix dependencies. Possible, but not trivial.

I'm not super familiar with nixops, but this doesn't actually sound that hard. And in fact all these steps could still on the client machine IIUC. The client machine could evaluate the nixops expression, send the right subexpressions to the right machines and then run nixos-rebuild on each of those machines over ssh.

samuela avatar Oct 07 '20 17:10 samuela

Indeed, this is all technically possible and even desirable, and if this were JavaScript someone would probably have already done it.

I know there is a Haskell parser for Nix, perhaps that project can do the heavy lifting.

There's also nixjs written by @svanderburg but I don't know if it is at a level that it can extract dependency trees from Nix.

Even something that copies all the files that might be needed instead of only the files that are needed would be great.

wmertens avatar Oct 07 '20 20:10 wmertens

Even something that copies all the files that might be needed instead of only the files that are needed would be great.

Yeah, this seems like a good way to go. And pretty easy too!

samuela avatar Oct 07 '20 20:10 samuela

It's not possible to just push the whole configuration as a first step ? (Special care may be needed for secrets, but not sure)

tobiasBora avatar Oct 07 '20 20:10 tobiasBora