nomad icon indicating copy to clipboard operation
nomad copied to clipboard

Support running executables from mounted paths in exec driver (like docker driver)

Open arianvp opened this issue 5 years ago • 23 comments

Nomad version

Nomad v0.10.5

Operating system and Environment details

NixOS 20.03

Issue

I want to run nomad jobs whose payloads are installed with the Nix package manager which is a bit of an odd-ball in that it doesn't really adhere to FHS. this means that nomad's strategy of copying files (or hardlinkin) from /lib and /bin doesn't work

All the binaries on the system are in a content-adressable read-only path called the nix store; and the dependencies of those binaries are also in the nix store.

e.g. the env binary lives in the coreutils package, and refers to the librt, libacl, libattr, libpthread and libc packages in a content-addressable manner:

$ ls -tal /usr/bin/env
lrwxrwxrwx 1 root root 66 Jul  9 20:27 /usr/bin/env -> /nix/store/x0jla3hpxrwz76hy9yckg1iyc9hns81k-coreutils-8.31/bin/env
$ ldd /usr/bin/env
	linux-vdso.so.1 (0x00007ffc91799000)
	librt.so.1 => /nix/store/xg6ilb9g9zhi2zg1dpi4zcp288rhnvns-glibc-2.30/lib/librt.so.1 (0x00007feceb533000)
	libacl.so.1 => /nix/store/rq9vqzjqay8fz8qdidmbgs3lqpq0y6zb-acl-2.2.53/lib/libacl.so.1 (0x00007feceb528000)
	libattr.so.1 => /nix/store/b3yikpnxly8vgr2c0sspwqckx44hb474-attr-2.4.48/lib/libattr.so.1 (0x00007feceb520000)
	libpthread.so.0 => /nix/store/xg6ilb9g9zhi2zg1dpi4zcp288rhnvns-glibc-2.30/lib/libpthread.so.0 (0x00007feceb4ff000)
	libc.so.6 => /nix/store/xg6ilb9g9zhi2zg1dpi4zcp288rhnvns-glibc-2.30/lib/libc.so.6 (0x00007feceb340000)
	/nix/store/xg6ilb9g9zhi2zg1dpi4zcp288rhnvns-glibc-2.30/lib/ld-linux-x86-64.so.2 => /nix/store/xg6ilb9g9zhi2zg1dpi4zcp288rhnvns-glibc-2.30/lib64/ld-linux-x86-64.so.2 (0x00007feceb53f000)

If I would want to run a nix-managed binary on nomad, it means that all the dependencies of the binary should be in the chroot. These can be queried in nix using the --query command:

$ nix-store --query --references  /nix/store/x0jla3hpxrwz76hy9yckg1iyc9hns81k-coreutils-8.31/bin/env
/nix/store/xg6ilb9g9zhi2zg1dpi4zcp288rhnvns-glibc-2.30
/nix/store/b3yikpnxly8vgr2c0sspwqckx44hb474-attr-2.4.48
/nix/store/rq9vqzjqay8fz8qdidmbgs3lqpq0y6zb-acl-2.2.53
/nix/store/x0jla3hpxrwz76hy9yckg1iyc9hns81k-coreutils-8.31

I could put those in the chroot_env setting in the nomad client config; however the paths on which each binary depends is obviosuly different and thus dynamic. Using chroot_env which is static is thus not an option.

To solve this, I decided to mount the /nix/store folder into the container that nomad is running using nomad's volume and volume_mount support. this is fine to do; as the /nix/store is a read-only file-system, containing only public files.

However, with driver = "exec" I get an error message:

rpc error: code = Unknown desc = file /nix/store/x0jla3hpxrwz76hy9yckg1iyc9hns81k-coreutils-8.31/bin/env not found under path /tmp/NomadClient003956682/ac65b7db-dc46-aa5b-46ba-8aed0bfa55d9/example"

which seems to suggest that nomad checks whether the command exists before paths in volume_mounts are mounted. If I change to driver = "docker" (and not use any executables from the docker container, but again refer to an executable that is in the volume_mount) then it works just fine! In the docker driver the check whether the command exists seems to happen after the volume_mounts are mounted.

I would really like that exec and docker would behave the same here. Currently I'm using docker driver purely as a workaround, so that I can run bind-mounted executables, but I would rather use the exec driver directly instead.

Proposed fix

change the exec driver behaviour to check for the existence of the command after all paths have been mounted. so that I do not have to use the docker driver just to trick nomad to execute a binary in a mounted path

Reproduction steps

  1. Install nix pakage manager

  2. Install coreutils using nix:

    $ nix-build '<nixpkgs>' -A coreutils --no-out-link
    /nix/store/x0jla3hpxrwz76hy9yckg1iyc9hns81k-coreutils-8.31
    
  3. If needed, replace the path to coreutils in the .nomad files below if they're different. As they're content-addressed; they might vary across OS (Mac vs Linux) and version of the nix package set

  4. sudo nomad agent -dev ./config.hcl

  5. nomad run example3.hcl and observe that using docker driver env gets executed just fine

    $ nomad run example3.nomad
    $ nomad alloc logs 8fd475a2
    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
    HOSTNAME=618d19b12602
    NOMAD_ALLOC_DIR=/alloc
    NOMAD_ALLOC_ID=8fd475a2-0074-8df4-6c74-cf6bb41fc6f4
    NOMAD_ALLOC_INDEX=0
    NOMAD_ALLOC_NAME=example3.example[0]
    NOMAD_CPU_LIMIT=100
    NOMAD_DC=dc1
    NOMAD_GROUP_NAME=example
    NOMAD_JOB_NAME=example3
    NOMAD_MEMORY_LIMIT=300
    NOMAD_NAMESPACE=default
    NOMAD_REGION=global
    NOMAD_SECRETS_DIR=/secrets
    NOMAD_TASK_DIR=/local
    NOMAD_TASK_NAME=example
    HOME=/root
    
  6. nomad run example2.hcl and observer that using exec driver it says the env executable cannot be found

        2020-07-09T22:11:28.132+0200 [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=ac65b7db-dc46-aa5b-46ba-8aed0bfa55d9 task=example error="failed to launch command with executor: rpc error: code = Unknown desc = file /nix/store/x0jla3hpxrwz76hy9yckg1iyc9hns81k-coreutils-8.31/bin/env not found under path /tmp/NomadClient003956682/ac65b7db-dc46-aa5b-46ba-8aed0bfa55d9/example"
    

Job file (if appropriate)

config.hcl:

client {
  host_volume "nix-store" {
    path = "/nix/store"
    read_only = true
  }
}

example3.nomad:

job "example3" {
  datacenters = ["dc1"]
  type = "batch"
  group "example" {
    count = 1
    volume "nix-store" {
      type      = "host"
      source    = "nix-store"
      read_only = true
    }
    task "example" {
      driver = "docker"
      volume_mount {
        volume      = "nix-store"
        destination = "/nix/store"
      }
      config {
        # NOTE: This could even be an empty image! like 'scratch' but docker
        # needs a non-empty image to start. Nothing from the container itself
        # is used
        image = "alpine"
        command = "/nix/store/x0jla3hpxrwz76hy9yckg1iyc9hns81k-coreutils-8.31/bin/env"
      }
    }
  }
}

example2.nomad

job "example2" {
  datacenters = ["dc1"]
  type = "batch"
  group "example" {
    volume "nix-store" {
      type      = "host"
      source    = "nix-store"
      read_only = true
    }
    count = 1
    task "example" {
      driver = "exec"
      volume_mount {
        volume      = "nix-store"
        destination = "/nix/store"
      }
      config {
        command = "/nix/store/x0jla3hpxrwz76hy9yckg1iyc9hns81k-coreutils-8.31/bin/env"
      }
    }
  }
}

Nomad Client logs (if appropriate)

    2020-07-09T22:11:28.132+0200 [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=ac65b7db-dc46-aa5b-46ba-8aed0bfa55d9 task=example error="failed to launch command with executor: rpc error: code = Unknown desc = file /nix/store/x0jla3hpxrwz76hy9yckg1iyc9hns81k-coreutils-8.31/bin/env not found under path /tmp/NomadClient003956682/ac65b7db-dc46-aa5b-46ba-8aed0bfa55d9/example"

arianvp avatar Jul 09 '20 20:07 arianvp

I think we can fix this by skipping executor_linux.go's logic to find the absolute path when command is already absolute. Does this sound sane? To me it does.

I think the mounting only happens when libcontainer starts executing the binary. Not something nomad does itself explicitly. So making the assumption that when a user gives an absolute path, we don't have to check if it's in the taskdir might be a fair thing to do; though I can understand this might be a bit of an odd edge-case.

diff --git a/drivers/shared/executor/executor_linux.go b/drivers/shared/executor/executor_linux.go
index 77f133a81..7a0245408 100644
--- a/drivers/shared/executor/executor_linux.go
+++ b/drivers/shared/executor/executor_linux.go
@@ -115,31 +115,36 @@ func (l *LibcontainerExecutor) Launch(command *ExecCommand) (*ProcessState, erro
 	}
 	l.container = container
 
-	// Look up the binary path and make it executable
-	absPath, err := lookupTaskBin(command)
+	var path string
+	if filepath.IsAbs(command) {
+		path := command
+	} else {
+		// Look up the binary path and make it executable
+		absPath, err := lookupTaskBin(command)
 
-	if err != nil {
-		return nil, err
-	}
+		if err != nil {
+			return nil, err
+		}
 
-	if err := makeExecutable(absPath); err != nil {
-		return nil, err
-	}
+		if err := makeExecutable(absPath); err != nil {
+			return nil, err
+		}
 
-	path := absPath
+		path := absPath
 
-	// Ensure that the path is contained in the chroot, and find it relative to the container
-	rel, err := filepath.Rel(command.TaskDir, path)
-	if err != nil {
-		return nil, fmt.Errorf("failed to determine relative path base=%q target=%q: %v", command.TaskDir, path, err)
-	}
+		// Ensure that the path is contained in the chroot, and find it relative to the container
+		rel, err := filepath.Rel(command.TaskDir, path)
+		if err != nil {
+			return nil, fmt.Errorf("failed to determine relative path base=%q target=%q: %v", command.TaskDir, path, err)
+		}
 
-	// Turn relative-to-chroot path into absolute path to avoid
-	// libcontainer trying to resolve the binary using $PATH.
-	// Do *not* use filepath.Join as it will translate ".."s returned by
-	// filepath.Rel. Prepending "/" will cause the path to be rooted in the
-	// chroot which is the desired behavior.
-	path = "/" + rel
+		// Turn relative-to-chroot path into absolute path to avoid
+		// libcontainer trying to resolve the binary using $PATH.
+		// Do *not* use filepath.Join as it will translate ".."s returned by
+		// filepath.Rel. Prepending "/" will cause the path to be rooted in the
+		// chroot which is the desired behavior.
+		path = "/" + rel
+	}
 
 	combined := append([]string{path}, command.Args...)
 	stdout, err := command.Stdout()

arianvp avatar Jul 09 '20 20:07 arianvp

Hi @arianvp! That an interesting proposal, but it looks like anyone who uses an absolute path would not longer enjoy validation before we build the chroot or making sure the command is executable, so there's a bit of a degraded experience in that case. That's not necessarily a show-stopper but it's something to consider. I'm pretty sure a few of the Nomad folks are more familiar with nix than I am (maybe @notnoop?) and might have an alternate approach in mind.

Also, I ran this myself (but haven't tested it against a nix environment yet) and had to make the following adjustments to the patch to get it to compile:

patch
@@ -115,32 +115,36 @@ func (l *LibcontainerExecutor) Launch(command *ExecCommand) (*ProcessState, erro
        }
        l.container = container

-       // Look up the binary path and make it executable
-       absPath, err := lookupTaskBin(command)
-
-       if err != nil {
-               return nil, err
-       }
+       var path string
+       if filepath.IsAbs(command.Cmd) {
+               path = command.Cmd
+       } else {
+               // Look up the binary path and make it executable
+               absPath, err := lookupTaskBin(command)

-       if err := makeExecutable(absPath); err != nil {
-               return nil, err
-       }
+               if err != nil {
+                       return nil, err
+               }

-       path := absPath
+               if err := makeExecutable(absPath); err != nil {
+                       return nil, err
+               }

-       // Ensure that the path is contained in the chroot, and find it relative to the container
-       rel, err := filepath.Rel(command.TaskDir, path)
-       if err != nil {
-               return nil, fmt.Errorf("failed to determine relative path base=%q target=%q: %v", command.TaskDir, path, err)
-       }
+               path = absPath

-       // Turn relative-to-chroot path into absolute path to avoid
-       // libcontainer trying to resolve the binary using $PATH.
-       // Do *not* use filepath.Join as it will translate ".."s returned by
-       // filepath.Rel. Prepending "/" will cause the path to be rooted in the
-       // chroot which is the desired behavior.
-       path = "/" + rel
+               // Ensure that the path is contained in the chroot, and find it relative to the container
+               rel, err := filepath.Rel(command.TaskDir, path)
+               if err != nil {
+                       return nil, fmt.Errorf("failed to determine relative path base=%q target=%q: %v", command.TaskDir, path, err)
+               }

+               // Turn relative-to-chroot path into absolute path to avoid
+               // libcontainer trying to resolve the binary using $PATH.
+               // Do *not* use filepath.Join as it will translate ".."s returned by
+               // filepath.Rel. Prepending "/" will cause the path to be rooted in the
+               // chroot which is the desired behavior.
+               path = "/" + rel
+       }
        combined := append([]string{path}, command.Args...)
        stdout, err := command.Stdout()
        if err != nil {

tgross avatar Jul 10 '20 13:07 tgross

I know Mitchell is now diving into nix since recently but im not sure if he is involved with nomad or any programming stuff directly :p but yeh it would be great if someone with a nomad background but also some nix experience could take a look at this.

Im trying to come up with a usecase for non-nix people where this behaviour is desired too. Personally I expected nomad to bind mount /bin and /lib into the container instead of copying the files. I was a bit surprised by this behaviour in the first place. But I'm not sure changing how the chroot is built up is desired at this point though it might be worth discussing. In that case we wouldn't have to special case absolute paths, but would always use bind mounts as the main primitive for building the chroot. In that case we'd have to move the check whether the executable exists and is executable into libcontainer though. I suppose libcontainer itself already checks if the executable exists; it will just not set the executable bit.

However I don't think the "executable bits will be set" part is directly documented anywhere so perhaps it's worth dropping as a step.

@tgross The patch was written on my mobile phone so thanks for fixing it to compile :)

arianvp avatar Jul 10 '20 16:07 arianvp

There was a bit of brainstorming in this area during the latest Nomad Community Office Hours - IIRC the question we got was around the possibility of having Nomad Client fingerprint commands (e.g. ensure a Task will be scheduled on a Node with a minimum version of curl installed, etc.). A similar mechanism already exists around device drivers, we could potentially expand on that and provide a way of fingerprinting "packages". One could imagine folks authoring plugins for fingerprinting things like coreutils, or nix packages, and have Nomad Client understand items returned by the plugin to be executable (along with metadata useful in scheduling). cc @cgbaker

shoenig avatar Jul 10 '20 16:07 shoenig

Funnily enough it does work when I do this:

export PATH=/nix/store/x0jla3hpxrwz76hy9yckg1iyc9hns81k-coreutils-8.31
job "example2" {
  datacenters = ["dc1"]
  type = "batch"
  group "example" {
    volume "nix-store" {
      type      = "host"
      source    = "nix-store"
      read_only = true
    }
    count = 1
    task "example" {
      driver = "exec"
      volume_mount {
        volume      = "nix-store"
        destination = "/nix/store"
      }
      config {
        command = "env"
      }
    }
  }
}

it does work?

somehow the binary being in PATH tricks nomad in not failing early

It's a workaround for my problem; but a tad surprising to say the least :)

arianvp avatar Jul 11 '20 16:07 arianvp

@arianvp Where did you actually have to export PATH=... to make the example work?

I was unable to get that work :( That said, I was able to get stuff running with artifacts. Either by providing a standalone closure tarball or by just providing a wrapper scripts calling paths from mounted /nix/store. I might have to go with closure artifacts, although I don't like the amount of disk space it requires.

datakurre avatar Aug 18 '20 17:08 datakurre

I'll share my nix files when I'm home!

arianvp avatar Aug 18 '20 18:08 arianvp

@arianvp FWIW we decided to go forward with using artifact tarballs instead of mounting /nix/store. Additional benefits of it "working out of the box" is that we don't need to expose more paths than we want, we don't need steps outside Nomad to provision /nix/store, nor we even need to have Nix on Nomad nodes.

That said, it would be nice if there was a way for exec driver to use /nix/store available on the host machine. A way to provide an allow list of required paths from /nix/store (or other allowed root) and then magically have them in exec chroot with ability to have command run from those paths...

I am new to nomad, so I don't know all the consequences of this. So, for now, we plan to play safe within the limits of exec with artifact.

datakurre avatar Aug 19 '20 09:08 datakurre

Adding PATH to the job's environment works for me. Once I did that, and configured the host volume for /nix/store, @arianvp's example worked perfectly.

job "example2" {
  datacenters = ["dc1"]
  type = "batch"
  group "example" {
    volume "nix-store" {
      type      = "host"
      source    = "nix-store"
      read_only = true
    }
    count = 1
    task "example" {
      driver = "exec"
      volume_mount {
        volume      = "nix-store"
        destination = "/nix/store"
      }
      env {
        PATH = "/nix/store/9v78r3afqy9xn9zwdj9wfys6sk3vc01d-coreutils-8.31/bin"
      }
      config {
        command = "env"
      }
    }
  }
}

utsl42 avatar Oct 07 '20 14:10 utsl42

So I've been experimenting with adding Nix support to the exec driver, and got a working proof of concept: https://github.com/manveru/nomad/commit/9815c22f874edc0151db8a9c0e770f2e9be467ea

It's not using bind-mounts yet, because i don't want to worry about the cleanup yet and let Nomad handle it for me. That means startup performance is a bit slow, depending on how large your derivation is, since every dependency path from the Nix store is copied. It also needs flakes as well as Nix being in Nomads PATH. For some reason this also seems necessary in the nomad agent config:

client {
  enabled = true
  chroot_env {
    "/etc/passwd" = "/etc/passwd"
  }
}

Minimal example:

job "tiny" {
  datacenters = ["dc1"]
  type = "batch"

  group "tiny" {
    task "hello" {
      driver = "exec"

      config {
        command = "/bin/cowsay"
        args = [ "Nix rocks!" ]
        flake = "github:NixOS/nixpkgs#cowsay"
      }

      env {
        LC_ALL = "C"
      }
    }
  }
}

It's possible that my symlink handling isn't perfect yet, and I kinda winged the way the job config flake option is parsed, I'm not sure if that is the right way to handle it. It also lacks any kind of test, so there be dragons.

Anyway, I hope this can help someone until I get around to making a proper PR, otherwise feel free to pick up the code and reuse it as you wish.

manveru avatar Jan 07 '21 14:01 manveru

Perhaps what could also work, would be a thin wrapper on the systemd-nspawn task driver. You'd specify the toplevel nix expression, and Nomad would essentially just run nix-build/nix build on a it which would output a NixOS system, which is configured to be a container. Sort of like NixOS currently does it with their containers. Then Nomad would just execute that with the systemd-nspawn task driver. It could even be done without any modifications to nomad itself, if you don't mind manually building the derivation. Also CSI would be nice but that's a topic for another issue. EDIT: we can get very close right now even, all that is needed is the ability to get the output of a Init Task and pass it as an argument into systemd-nspawn

MagicRB avatar Feb 03 '21 13:02 MagicRB

Yeah, I'm experimenting with an approach of having a server that takes urls like /github/<org>/<repo>/ref/<branch>/flakeAttr and serves the resulting tarball for the nspawn driver. Obvious issues are security since there's no way to do authentication and we don't want anyone to just build images... so having the driver to the build instead would be much preferred.

I think in the meanwhile I'll just setup an Hydra jobset to build the images and use those urls, but these URLs are just going to be a pain to update manually.

For now my fork of Nomad is working very nicely in production, and we'll keep using it until I find the time to test out nspawn more in-depth and maybe add some option to the driver instead.

manveru avatar Feb 05 '21 10:02 manveru

Hmm, I was too thinking about fetching nspawn tarballs from a URL but, then you couldn't configure the container at runtime, unless you do some magic in the containers init process. Definitely not with NixOS modules. And actually, I got another idea I'll test soon, it may be possible to nspawn a really light&thin container containing just nix and bash, just the basics and then build a Nix derivition of a NixOS system, you could then activate it and boom, Nix on Nomad. I've managed to test out the build nix expression in nspawn container part, now I just have to run a NixOS container through Nomad on a non-NixOS system. I'll report back soon

MagicRB avatar Feb 05 '21 10:02 MagicRB

What I do to address this is the following:

I nix-build a nomad job:

pkgs.writeText "nomad.hcl" ''
job "test" {
  group "test" {
    task "test" {
      driver = "exec"
      command = [ "/bin/sh", "-c", <<EOF-
        nix-store --realise ${pkgs.nginx}
        exec ${pkgs.nginx}/bin/nginx
      EOF]
    }
  }
}
''

Which is built locally (or in CI). and then pushed to cache. Then the output of this is:

job "test" {
  group "test" {
    task "test" {
      driver = "exec"
      command = [ "/bin/sh", "-c", <<EOF-
        nix-store --realise /nix/store/asdj123971298371-nginx1.18
        exec /nix/store/asdj123971298371-nginx1.18/bin/nginx
      EOF]
    }
  }
}

Which will cause nomad to never evaluate any nix code; just pull a binary from the cache.

nix-store --realise will donwload the binaries, and then the binary is executed.

This technique works really well for me so far.

arianvp avatar Feb 05 '21 10:02 arianvp

Yeah, but um, if you wanted to pull secrets from Vault, you couldn't pass it to nginx from the normal Nix infrastructure, the solution I'm hoping to get would reuse all of the NixOS infrastructure. So you'd just create a normal nixosSystem but with boot.isContainer and you could start nginx and such.

EDIT: but the idea with CI is great, I'll do that for auto refreshing nomad with the new store paths

MagicRB avatar Feb 05 '21 10:02 MagicRB

Yes; instead you can use the nspawn driver and point it to a nixosSystem. But with the same trick of pre-fetching I have code for this as well since recently. But it's a bit messy. It currently isn't open source; but I will see if I can get it polished enough this weekend to publish.

Problem with nspawn is I don't know how to inject a pre-task to fetch the nix store path containing the NixOS system to start up with nspawn in the next step

arianvp avatar Feb 05 '21 11:02 arianvp

You can just nspawn the same static container, containing nix, bash and nothing much more, then in there you can active the specific config you want. But yeah, if we pass data from prestart into the main task, this whole thing would a get a lot simpler

MagicRB avatar Feb 05 '21 11:02 MagicRB

well I just hit this... https://github.com/NixOS/nixpkgs/issues/40367

EDIT: I figured out that upstream systemd isn't able to boot a NixOS system, while downstream in NixOS can, I'm guessing it's this patch

MagicRB avatar Feb 05 '21 11:02 MagicRB

The same issue also applies to GNU Guix. I tried the same solution you did by mounting /gnu/store as a host volume and trying to exec the binary there, and got the same result.

sundbry avatar Apr 23 '21 06:04 sundbry

FWIW, I started working on an implementation: https://github.com/input-output-hk/nomad-driver-nspawn-nixos-container

Feedback will be very welcome.

My personal conclusion of this conversation here is:

  • I can bind-mount the nix-store, after having made sure the nixos-container system config has been actually built by the driver.

  • I need to be aware of a couple of symlinked and bind-mounted files, that are checked early on by nspawn. (It would be unfortunate to have to depend on a patched nspawn)

  • Do I need to run nixos's stage0 or the activation script or something before doing nspawn to get all the FHS symlinks in place and then actually copy / take care of those that make nspawn fail?

Anything else I need to know to kick the tires?

/cc @manveru @disassembler @cleverca22

@kamadorueda and while I'm at this would you mind helping with laying out some foundational thoughts on how to implement a fluidattacks/makes driver? (based on systemd-nspawn, I'd suppose)

blaggacao avatar Sep 10 '21 04:09 blaggacao

@blaggacao this is the patch I use, basically strips out a bunch of the symlink validation, and some other stuff, like makeExecutable, path hacking, etc, With this enabled Nomad on Guix (and probably Nix) are able to run programs on /gnu/store.

diff --git a/drivers/shared/executor/executor_linux.go b/drivers/shared/executor/executor_linux.go
index 3d5be7e80..0bd8ee1b8 100644
--- a/drivers/shared/executor/executor_linux.go
+++ b/drivers/shared/executor/executor_linux.go
@@ -116,33 +116,7 @@ func (l *LibcontainerExecutor) Launch(command *ExecCommand) (*ProcessState, erro
 	}
 	l.container = container
 
-	// Look up the binary path and make it executable
-	absPath, err := lookupTaskBin(command)
-
-	if err != nil {
-		return nil, err
-	}
-
-	if err := makeExecutable(absPath); err != nil {
-		return nil, err
-	}
-
-	path := absPath
-
-	// Ensure that the path is contained in the chroot, and find it relative to the container
-	rel, err := filepath.Rel(command.TaskDir, path)
-	if err != nil {
-		return nil, fmt.Errorf("failed to determine relative path base=%q target=%q: %v", command.TaskDir, path, err)
-	}
-
-	// Turn relative-to-chroot path into absolute path to avoid
-	// libcontainer trying to resolve the binary using $PATH.
-	// Do *not* use filepath.Join as it will translate ".."s returned by
-	// filepath.Rel. Prepending "/" will cause the path to be rooted in the
-	// chroot which is the desired behavior.
-	path = "/" + rel
-
-	combined := append([]string{path}, command.Args...)
+	combined := append([]string{command.Cmd}, command.Args...)
 	stdout, err := command.Stdout()
 	if err != nil {
 		return nil, err

If you do make a special plugin to support nixos, I think maybe you should consider keeping the systemd-nspawn dependency out of it and just let the OS execute the path. I could see that getting in the way when you dont want to run in a container or you don't want to have that dependency added (I'm looking at you, SystemD!). Most of the value added I think would be in a) fixing this issue so you can actually run a program without hacking patches, and b) allowing to run a nix recipe (including automatically building it) would be awesome.

sundbry avatar Sep 10 '21 04:09 sundbry

@sundbry Thank you for this tale-telling diff.

If you do make a special plugin to support nixos, I think maybe you should consider keeping the systemd-nspawn dependency out of it and just let the OS execute the path.

My current deployment scenario involves effectively scheduling (nomad) the scheduler (systemd).

I see however that there is room for a nomad-driver-nix, and both drivers would have quite some code overlap, indeed.

EDIT: Or did you mean: let a nomad-driver-nix execute systemd-nspawn? Interesting. To already use the patched version, that is?

allowing to run a nix recipe (including automatically building it) would be awesome.

The dots to be connected here are with fluidattacks/makes and the vector I guess is @kamadorueda ? Not discarding the possibility of a more generalized interface, of course.

EDIT: Do I need to worry about shadowing non-eligible store paths so that an unauthorized intruder has not immediate access to the entire nix store? Nor could infer priviliged information from what else is already built on that host?

blaggacao avatar Sep 10 '21 04:09 blaggacao

I forgot there is one more catch, you have to specify the PATH for the task for this to work, or use an absolute path for the command line since we stripped out all that relative path string hacking, and we mount /gnu/store inside the filesystem namespace (or /nix/store as the case may be). The "store" host volume is specified in the main nomad.conf. Finally "/run/current-system" is added to the chroot_env in nomad.json as well so that symlink is copied into all of the chroots, so the $PATH on the host is the same as in the container basically, and I can just say command = "etcd", and its the same etcd whichever one is the current etcd on my host environment.

This last part about the etcd being whichever one is on the host is not ideal. It would be much better to just specify a guix/nix package manifest and have it run that exact version of etcd so we can control which version of the program is rolling out.

  task "etcd" {
      driver = "exec"

      env {
        PATH = "/run/current-system/profile/bin"
      }

      user = "root"

      config {
        command = "etcd"
        args = ["--data-dir", "/var/etcd"]
        pid_mode = "private"
        ipc_mode = "private"
      }

      resources { 
        cpu = 500
        memory = 512
      }

      volume_mount {
        volume      = "store"
        destination = "/gnu/store"
      }

      volume_mount {
        volume      = "data"
        destination = "/var/etcd"
      }
    }

sundbry avatar Sep 10 '21 05:09 sundbry

Closed by https://github.com/hashicorp/nomad/pull/14851 which will ship in Nomad 1.4.3

tgross avatar Nov 18 '22 20:11 tgross

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Mar 19 '23 02:03 github-actions[bot]