nomad icon indicating copy to clipboard operation
nomad copied to clipboard

support `SO_REUSEPORT` for safe reuse of static ports

Open victorstewart opened this issue 1 year ago • 2 comments

on https://developer.hashicorp.com/nomad/docs/job-specification/network#host_network

it states...

Since multiple services cannot share a port, the port must be open in order to place your task.

but this is false: if using host networking, and SO_REUSEPORT, ports can be shared. docker supports this, i assume the containerd plugin as well.

just wanted to confirm Nomad doesn't refuse to start jobs that all use the same static port.

victorstewart avatar Dec 07 '22 18:12 victorstewart

Hi @victorstewart! Unfortunately the docs aren't incorrect here, as the scheduler checks for port collisions. So if you run a job like the following:

job with static port
job "httpd" {
  datacenters =  ["dc1"]

  group "web" {

    network {
      mode = "host"
      port "www" {
        static = 8001
      }
    }

    task "http" {

      driver = "docker"

      config {
        image   = "busybox:1"
        command = "httpd"
        args    = ["-v", "-f", "-p", "8001", "-h", "/local"]
        ports   = ["www"]
      }

      template {
        data        = "<html>hello, world</html>"
        destination = "local/index.html"
      }

      resources {
        cpu    = 128
        memory = 128
      }

    }
  }
}

That'll run fine, and then if you run the same job with a different ID the scheduler will reject it:

$ nomad job plan ./example.nomad
+ Job: "httpd2"
+ Task Group: "web" (1 create)
  + Task: "http" (forces create)

Scheduler dry-run:
- WARNING: Failed to place all allocations.
  Task Group "web" (failed to place 1 allocation):
    * Resources exhausted on 1 nodes
    * Class "local" exhausted on 1 nodes
    * Dimension "network: reserved port collision www=8001" exhausted on 1 nodes

Job Modify Index: 0
To submit the job with version verification run:

nomad job run -check-index 0 ./example.nomad

When running the job with the check-index flag, the job will only be run if the
job modify index given matches the server-side version. If the index has
changed, another user has modified the job and the plan's results are
potentially invalid.

The SO_REUSEPORT use case is obviously important for high availability, but it's also behavior specific to not only the task driver but the workload itself. I'm going to retitle this issue and relabel it as a feature request, but I don't have a solid idea of how we'd change the behavior in a way that's expected for users who aren't using SO_REUSEPORT.

tgross avatar Jan 03 '23 16:01 tgross

okay no worries. between now and then i ended up writing my own machine orchestrator, program scheduler and container runtime... just ended up being the path of least resistance for me for many reasons. so i ended up completely sidestepping this issue.

but i do think this is an important feature.

take an array of QUIC programs that use a load balancer to assist connection establishment, then switch to their unique unicast address. all of these servers must run on the public (read: external facing) 443 port for most client firewalls, at whatever waypoint, to reliably allow the UDP traffic through. but this pattern would be impossible in Nomad today.

(not everyone load balances or proxies all traffic through CNI meshes).

best of luck!

victorstewart avatar Jan 03 '23 16:01 victorstewart

In our infrastructure, because 443 is generally only used by HAProxy for load balancing, we opted to NOT register the port with Nomad at all but to still have the application listen on 443. We also created tags that other jobs (that might want to use 443) can use as a negative constraint to avoid scheduling on nodes with HAProxy.

By doing the above, we can take advantage of SO_REUSEPORT for more seamless load balancer reloads.

joliver avatar Jul 18 '23 17:07 joliver