nomad-driver-podman icon indicating copy to clipboard operation
nomad-driver-podman copied to clipboard

Support network bridge mode

Open drewbailey opened this issue 4 years ago • 8 comments

Driver currently supports bridge network mode via task config but not from a driver and task group perspective.

Support the connect demo

job "countdash" {
  datacenters = ["dc1"]

  group "api" {
    network {
      mode = "bridge"
    }

    service {
      name = "count-api"
      port = "9001"

      connect {
        sidecar_service {}
      }
    }

    task "web" {
      driver = "podman"

      config {
        image = "hashicorpnomad/counter-api:v1"
      }
    }
  }

  group "dashboard" {
    network {
      mode = "bridge"

      port "http" {
        static = 9002
        to     = 9002
      }
    }

    service {
      name = "count-dashboard"
      port = "9002"

      connect {
        sidecar_service {
          proxy {
            upstreams {
              destination_name = "count-api"
              local_bind_port  = 8081
            }
          }
        }
      }
    }

    task "dashboard" {
      driver = "podman"

      env {
        COUNTING_SERVICE_URL = "http://${NOMAD_UPSTREAM_ADDR_count_api}"
      }

      config {
        image = "hashicorpnomad/counter-dashboard:v1"
      }
    }
  }
}

drewbailey avatar Jun 29 '20 13:06 drewbailey

due to this change it does not seem possible to use consul connect with the podman driver

maartenbeeckmans avatar Jan 08 '22 19:01 maartenbeeckmans

due to this change it does not seem possible to use consul connect with the podman driver

@maartenbeeckmans Have you been able to use podman and connect? I'm only facing missing drivers issue whenever I use podman with connect.

deepbluemussel avatar Feb 16 '22 22:02 deepbluemussel

I was not able to use it, tried to modify the example by setting the task driver to podman and the sidecar_task.driver to podman but had several issues.

Biggest issue was the support for bridge mode on group level instead of task level, which is a requirement for consul connect iirc.

maartenbeeckmans avatar Feb 17 '22 08:02 maartenbeeckmans

I struggled too with making this example work. Everything is correctly set up because if I use the docker driver it works but not when I use Podman (which is correctly working for regular jobs). Well, let's hope for our maintainers to have time to work on this big improvement.

deepbluemussel avatar Feb 17 '22 08:02 deepbluemussel

Driver currently supports bridge network mode via task config but not from a driver and task group perspective.

If I didn't miss anything this should be / is supported (or did you get any error messages related to this or is it documented somewhere?). From the README:

By default the task uses the network stack defined in the task group, see network Stanza. If the groups network behavior is also undefined, it will fallback to bridge in rootful mode or slirp4netns for rootless containers.

  • bridge: create a network stack on the default podman bridge.

And the "Features" section even claims that Consul Connect is supported (which conflicts with this issue and my experience - at least in terms of a practical setup):

Support for nomad shared network namespaces and consul connect

However, it currently seems to be broken because the loopback interface ("lo") doesn't get initizlized properly. This bug is tracked via https://github.com/hashicorp/nomad/issues/10014 and affects at least the "exec" and "podman" drivers. There are two hacky workarounds though:

  1. Executing ip -n "$NS" link set lo up from the host (from the root network namespace / outside the Podman container). This can be automated via scripts (e.g., using inotify (example)). I did sucessfully test this approach with the "exec" driver and a Python script. (Note: The best way to get the Envoy binary is apparently to copy it from the container image.)
  2. Using an additional task with the "raw_exec" driver (less isolated but apparently it also runs in the same network namespace). An example can be seen here: https://discuss.hashicorp.com/t/consul-connect-envoy-without-docker/4824/7 (but one needs to enable the "raw_exec" driver first and I didn't test it).

I'm only facing missing drivers issue whenever I use podman with connect.

That should be because of the following:

      connect {
        sidecar_service {}
      }

This uses the Docker driver by default:

  • https://www.nomadproject.io/docs/job-specification/connect
  • https://www.nomadproject.io/docs/job-specification/sidecar_task

The default Envoy task is equivalent to the configuration shown here: https://www.nomadproject.io/docs/job-specification/sidecar_task#default-envoy-configuration

The solution is to use the following:

      connect {
        sidecar_service {}
        sidecar_task {
          driver = "podman"
          config {
            image = "docker.io/envoyproxy/envoy:v1.21.1"
            # image = "localhost/envoy-podman:v1.21.1"
            command = "/docker-entrypoint.sh"
            args = [
              "-c",
              "${NOMAD_SECRETS_DIR}/envoy_bootstrap.json",
              "-l",
              "${meta.connect.log_level}",
              "--concurrency",
              "${meta.connect.proxy_concurrency}",
              "--disable-hot-restart"
            ]
          }
        }
      }

However, in addition to the aforementioned issue with the loopback interface, I hit two more issues when using the Envoy image with Podman through Nomad:

  1. I got the chown: changing ownership of '/dev/std{out,err}': Permission denied errors (there are multiple upstream issues regarding this, e.g., https://github.com/envoyproxy/envoy/issues/14787). I worked around this by simply removing the two commands from docker-entrypoint.sh (the permissions are already fine).
  2. On "Rocky Linux 8.5 (Green Obsidian)" with SELinux enabled the container user couldn't access /secrets/envoy_bootstrap.json.

Anyway, the tl;dr is that I cannot recommend trying to get Consul Connect working with the "podman" (or "exec") driver at this point. It should be possible (I at least managed to get it working with the "exec" driver) but it isn't pretty/practical at all.

The most important blocker is https://github.com/hashicorp/nomad/issues/10014. And after that the Envoy container image needs to be improved to work with SELinux+Nomad+Podman (but IIRC it was working fine without Nomad (i.e., SELinux+Podman) so this might rather need fixes in nomad-driver-podman than a docker-entrypoint.sh workaround.

primeos-work avatar May 02 '22 16:05 primeos-work

@primeos-work Tagged you in a comment on a dicuss.hashicorp thread as well. Sorry if that's annoying.

Anyway, the tl;dr is that I cannot recommend trying to get Consul Connect working with the "podman" (or "exec") driver at this point. It should be possible (I at least managed to get it working with the "exec" driver) but it isn't pretty/practical at all.

Just to be sure, are you saying that it's not worthwhile to try and use the exec driver when in bridge mode?

l-monninger avatar Sep 16 '22 16:09 l-monninger

No, I was just saying that IMO the Nomad + Podman + Consul Connect seemed too much trouble at the time to use it (especially in production). Now, with https://github.com/hashicorp/nomad/issues/10014 resolved, it should be fine(-ish). I guess we can close this issue too now? I did at least manage to get the Consul Connect demo working with the podman driver (not that pretty yet but doable). And the network bridge mode should work with all drivers now (since/with https://github.com/hashicorp/nomad/pull/13428).

primeos-work avatar Sep 20 '22 13:09 primeos-work

Huh, weird. In the situation referenced in that discuss.hashicorp post, I could not do anything network-based in the exec task. Do you have an example set of configs you could send my way?

l-monninger avatar Sep 20 '22 17:09 l-monninger

Also running into the same driver issue when trying to use consul as the service provider.

ZackaryWelch avatar Feb 03 '23 23:02 ZackaryWelch

Also running into the same driver issue when trying to use consul as the service provider.

You need to grab the sidecar_task block here and change the driver to Podman. https://developer.hashicorp.com/nomad/docs/job-specification/sidecar_task

Consul connec and bridge on a group works for me in the latest Nomad versions on Rocky 9.

p1u3o avatar Mar 30 '23 05:03 p1u3o

Hi folks, starting with Nomad v1.6 (and a nomad-driver-podman release shortly thereafter) (ETA mid-June-ish), the Connect with Podman story should be significantly improved. The goal of https://github.com/hashicorp/nomad/issues/17042 is to make sure we can use podman and Connect jobs without extra configuration (other than specifying driver = "podman" and installing the podman task driver).

So far things look good with ubuntu 22.04 and podman v3.4.4. I still need to verify the RHEL and podman 4 side of things. If there are additional issues we can track them in the ticket above. I believe https://github.com/hashicorp/nomad/pull/13428 / https://github.com/hashicorp/nomad/issues/10014 resolved the bridge mode issues, so I'll go ahead and close out this ticket.

PS - Thanks to everyone who has helped by investigating or fixing issues - this driver has been a monumental community effort and wouldn't be possible without ya'll!

shoenig avatar May 03 '23 15:05 shoenig

Only with Nomad v1.6 it works?

Allan-Nava avatar Jun 01 '23 08:06 Allan-Nava