nomad icon indicating copy to clipboard operation
nomad copied to clipboard

Java driver should not fingerprint if java not found in exec chroot

Open c16a opened this issue 3 years ago • 5 comments

Nomad version

1.1.1

Operating system and Environment details

Red Hat Linux 8.3

Issue

When trying to deploy a Java workload, the job is being allocated to one of the Nomad clients with the Java driver activated, but it fails to run, and throws the below error. Please note that app is the name of my task.

failed to launch command with executor: rpc error: code = Unknown desc = file /var/openjdk/bin/java not found under path /opt/nomad/alloc/49280540-a500-55bd-f437-fdd497467ddc/app

Reproduction steps

Extract a binary distribution of JDK into /var/openjdk Run a Nomad client with the below configuration

# Excerpt from config.hcl
client {
  enabled = true

  server_join {
    retry_join = [ "provider=aws tag_key=Name tag_value=nomad-server" ]
    retry_max = 0
    retry_interval = "15s"
  }

  options = {
    "driver.allowlist" = "podman,java"
  }
}

plugin "podman" {

}

plugin "java" {

}

Below is the systemd unit file

[Unit]
Description=Nomad
Documentation=https://www.nomadproject.io/docs/
Wants=network-online.target
After=network-online.target

[Service]
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/usr/bin/nomad agent -config /etc/nomad.d
KillMode=process
KillSignal=SIGINT
LimitNOFILE=65536
LimitNPROC=infinity
Restart=on-failure
RestartSec=2

# Removing this line starts nomad, but doesn't activate java driver
Environment="PATH=$PATH:/var/openjdk/bin"

TasksMax=infinity
OOMScoreAdjust=-1000

[Install]
WantedBy=multi-user.target

Expected Result

  1. Job is allocated to a Nomad client with Java driver activated
  2. Job is scheduled and runs normally

Actual Result

Job is allocated, but cannot start. The below error is thrown

failed to launch command with executor: rpc error: code = Unknown desc = file /var/openjdk/bin/java not found under path /opt/nomad/alloc/49280540-a500-55bd-f437-fdd497467ddc/app

:interrobang: :interrobang: Update

Add /var to the client's chroot with the below configuration, gives me a new error

  chroot_env {
    "/bin" = "/bin"
    "/etc" = "/etc"
    "/lib" = "/lib"
    "/lib32" = "/lib32"
    "/lib64" = "/lib64"
    "/run/resolvconf" = "/run/resolvconf"
    "/sbin" = "/sbin"
    "/usr" = "/usr"
    "/var" = "/var"
  }

[ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=f5911e98-0101-f597-c79f-d573594b7ea4 task=app error="failed to launch command with executor: rpc error: code = Unknown desc = container_linux.go:367: starting container process caused: exec: "/var/openjdk/bin/java": stat /var/openjdk/bin/java: permission denied"

Job file (if appropriate)

A typical example Java job

c16a avatar Jun 13 '21 08:06 c16a

@c16a It would be nice if you updated the ticket with the actual error you have now; otherwise triagers would have to answer something that you already fixed…

apollo13 avatar Jun 14 '21 09:06 apollo13

Very interesting. That's a bug indeed in the Java driver - it's currently failing because /var/openjdk is not in the chroot, so the binary isn't accessible within the task container Nomad creates.

In the short term, I would recommend installing openjdk in a folder in the default chroot, e.g. /usr/local/openjdk; or updating the client's chroot to include /var/openjdk.

I'll leave the bug open, we should report a better error and have the java fingerprint report an error if the java binary is not in the chroot environment.

notnoop avatar Jun 15 '21 14:06 notnoop

I've added /var to chroot with the below configuration, but the issue still persists. SELinux is currently active, so I disabled it with setenforce 0, but no luck.

  chroot_env {
    "/bin" = "/bin"
    "/etc" = "/etc"
    "/lib" = "/lib"
    "/lib32" = "/lib32"
    "/lib64" = "/lib64"
    "/run/resolvconf" = "/run/resolvconf"
    "/sbin" = "/sbin"
    "/usr" = "/usr"
    "/var" = "/var"
  }

c16a avatar Jun 16 '21 00:06 c16a

@c16a It would be nice if you updated the ticket with the actual error you have now; otherwise triagers would have to answer something that you already fixed…

Sorry, missed it. Updated the issue here now.

c16a avatar Jun 16 '21 01:06 c16a

Hello, guys.  I faced the same issue but using a custom binary file. driver: exec. The deployment was successful after moving the binary file to the /usr/local folder. Error:

failed to launch command with executor: rpc error: code = Unknown desc = file /home/cored/bin/linux/cored not found under path /opt/nomad/alloc/7d287a4a-ec25-e3a7-d22c-126cb96c31b1/cored

wusikijeronii avatar Oct 10 '22 18:10 wusikijeronii