nomad icon indicating copy to clipboard operation
nomad copied to clipboard

Consul Connect service health checks not accessible?

Open evandam opened this issue 4 years ago • 10 comments

Nomad version

Nomad v1.0.2 (4c1d4fc6a5823ebc8c3e748daec7b4fda3f11037)

Operating system and Environment details

Ubuntu 18.04

Issue

When running a service binding a port locally (ex 127.0.0.1:8080), it seems that Consul health checks cannot access them, and I'm unable to use options like expose or address_mode.

I would expect this to be a pretty common approach if I understand correctly (to avoid leaking ports that could be accessed outside of Consul Connect). Can the guides/docs add steps for health checks in https://www.nomadproject.io/docs/integrations/consul-connect?

Reproduction steps

Using the following job, try adding expose = true or address_mode = "driver" to the check and note the errors.

With expose = true:

❯ nomad job run debug/python_http.hcl
Error submitting job: Unexpected response code: 500 (error in job mutator expose-check: unable to determine local service port for service check app->python-http->python-http-health)

This happens even if I pass port = "8080" in the check configuration.

With address_mode = "driver":

The job is deployed, but the task fails with the following log:

failed to setup alloc: pre-run hook "group_services" failed: error getting address for check "python-http-health": cannot use address_mode="driver": no driver network exists

Job file (if appropriate)

job "python-http" {
  datacenters = ["kitchen"]

  group "app" {
    network {
      mode = "bridge"
      port "http" {}
    }

    task "python-http" {
      driver = "docker"

      config {
        image = "python:3"
        command = "python3"
        args = [
          "-m",
          "http.server",
          "-b",
          "127.0.0.1",
          "${NOMAD_PORT_http}",
        ]
      }

      env {
        PYTHONUNBUFFERED = "1"
      }

      resources {
        cpu = 20
        memory = 100
      }
    }

    service {
      name = "python-http"
      port = "http"

      check {
        type     = "http"
        name     = "python-http-health"
        path     = "/"
        interval = "10s"
        timeout  = "3s"
        # address_mode = "driver"
        # expose = "true"
      }

      connect {
        sidecar_service {}
      }
    }
  }
}

evandam avatar Jan 27 '21 20:01 evandam

After a decent amount of trial and error, it looks like an issue with named ports instead of hard-coded ports.

I'm not sure if this is a bug or expected behavior, but it's certainly confusing. Any chance docs could capture this either way?

evandam avatar Jan 27 '21 22:01 evandam

@evandam given you're running in mesh, is there a reason you aren't using hard coded ports? Since it's all internal there's no chance of conflict. Here's an example of how we're doing it

 group "<redacted>-group" {
    count = [[ .api.count ]]

    constraint {
      attribute = "${meta.general_compute_linux}"
      value     = "true"
    }
    
    network {
      mode = "bridge"
      port "exposed"{}
    }

    service {
      name         = "<redacted>"
      tags         = [ "http" ]
      port         = "9090"
      check {
        expose   = true
        type     = "http"
        port     = "exposed"
        path     = "/hc"
        interval = "10s"
        timeout  = "5s"
      }
      
      connect {
        sidecar_service {
          proxy {}
        }
      }
    }

and our task (snipped)

task "<redacted>" {
   driver = "docker"
  
   config {
     image        = "<redacted>"
     volumes      = [
       "local/overrides:/app/overrides"
     ]
     cpu_hard_limit = true
   }

   env {
     ASPNETCORE_URLS         = "http://+:9090"
   }

   resources {
     cpu    = [[ .api.resources.cpu ]] # Mhz
     memory = [[ .api.resources.memory ]] # MB
   }
 }
}

idrennanvmware avatar Jan 28 '21 23:01 idrennanvmware

Hey @idrennanvmware, after learning this was the issue there's not necessarily a requirement to use named ports, but generally I like using them for readability. I also wouldn't have expected the behavior to be different when using named/hard-coded ports, so it just seems like a point of confusion.

evandam avatar Jan 28 '21 23:01 evandam

Hey @evandam! Thanks for raising the issue.

What do you think about the following update?

The port in the service stanza is the port the API service listens on. The
Envoy proxy will automatically route traffic to that port inside the network
-namespace.
+namespace. Note that this cannot be a named port; it must be a hard-coded port
+value.

krishicks avatar Feb 08 '21 16:02 krishicks

Sounds good to me, thanks!

evandam avatar Feb 08 '21 17:02 evandam

This explains my issue here.

Thanks for making it clear

xeroc avatar Mar 10 '21 09:03 xeroc

https://github.com/hashicorp/nomad/pull/10225 will fix the docs, and I'm going to keep this issue open as a feature request to fix.

tgross avatar Mar 24 '21 20:03 tgross

Any timeline on this fix at the moment? It's a real pain not being able to use dynamic ports in service definitions.

mircea-c avatar Sep 30 '21 00:09 mircea-c

Any updates on this?

Oloremo avatar Jul 15 '22 10:07 Oloremo

I've noticed that dynamic port labels can be used without causing any errors (granted I still have errors, but I think they're unrelated). Is this expected now?

bradydean avatar Dec 24 '22 03:12 bradydean

As of October 2023, the workaround documented here seems to enable usage of dynamic ports: https://discuss.hashicorp.com/t/port-mapping-with-nomad-and-consul-connect/16738/5

ElectroTiger avatar Oct 09 '23 03:10 ElectroTiger