nomad
nomad copied to clipboard
Consul Connect service health checks not accessible?
Nomad version
Nomad v1.0.2 (4c1d4fc6a5823ebc8c3e748daec7b4fda3f11037)
Operating system and Environment details
Ubuntu 18.04
Issue
When running a service binding a port locally (ex 127.0.0.1:8080
), it seems that Consul health checks cannot access them, and I'm unable to use options like expose
or address_mode
.
I would expect this to be a pretty common approach if I understand correctly (to avoid leaking ports that could be accessed outside of Consul Connect). Can the guides/docs add steps for health checks in https://www.nomadproject.io/docs/integrations/consul-connect?
Reproduction steps
Using the following job, try adding expose = true
or address_mode = "driver"
to the check and note the errors.
With expose = true
:
❯ nomad job run debug/python_http.hcl
Error submitting job: Unexpected response code: 500 (error in job mutator expose-check: unable to determine local service port for service check app->python-http->python-http-health)
This happens even if I pass port = "8080"
in the check
configuration.
With address_mode = "driver"
:
The job is deployed, but the task fails with the following log:
failed to setup alloc: pre-run hook "group_services" failed: error getting address for check "python-http-health": cannot use address_mode="driver": no driver network exists
Job file (if appropriate)
job "python-http" {
datacenters = ["kitchen"]
group "app" {
network {
mode = "bridge"
port "http" {}
}
task "python-http" {
driver = "docker"
config {
image = "python:3"
command = "python3"
args = [
"-m",
"http.server",
"-b",
"127.0.0.1",
"${NOMAD_PORT_http}",
]
}
env {
PYTHONUNBUFFERED = "1"
}
resources {
cpu = 20
memory = 100
}
}
service {
name = "python-http"
port = "http"
check {
type = "http"
name = "python-http-health"
path = "/"
interval = "10s"
timeout = "3s"
# address_mode = "driver"
# expose = "true"
}
connect {
sidecar_service {}
}
}
}
}
After a decent amount of trial and error, it looks like an issue with named ports instead of hard-coded ports.
I'm not sure if this is a bug or expected behavior, but it's certainly confusing. Any chance docs could capture this either way?
@evandam given you're running in mesh, is there a reason you aren't using hard coded ports? Since it's all internal there's no chance of conflict. Here's an example of how we're doing it
group "<redacted>-group" {
count = [[ .api.count ]]
constraint {
attribute = "${meta.general_compute_linux}"
value = "true"
}
network {
mode = "bridge"
port "exposed"{}
}
service {
name = "<redacted>"
tags = [ "http" ]
port = "9090"
check {
expose = true
type = "http"
port = "exposed"
path = "/hc"
interval = "10s"
timeout = "5s"
}
connect {
sidecar_service {
proxy {}
}
}
}
and our task (snipped)
task "<redacted>" {
driver = "docker"
config {
image = "<redacted>"
volumes = [
"local/overrides:/app/overrides"
]
cpu_hard_limit = true
}
env {
ASPNETCORE_URLS = "http://+:9090"
}
resources {
cpu = [[ .api.resources.cpu ]] # Mhz
memory = [[ .api.resources.memory ]] # MB
}
}
}
Hey @idrennanvmware, after learning this was the issue there's not necessarily a requirement to use named ports, but generally I like using them for readability. I also wouldn't have expected the behavior to be different when using named/hard-coded ports, so it just seems like a point of confusion.
Hey @evandam! Thanks for raising the issue.
What do you think about the following update?
The port in the service stanza is the port the API service listens on. The
Envoy proxy will automatically route traffic to that port inside the network
-namespace.
+namespace. Note that this cannot be a named port; it must be a hard-coded port
+value.
Sounds good to me, thanks!
https://github.com/hashicorp/nomad/pull/10225 will fix the docs, and I'm going to keep this issue open as a feature request to fix.
Any timeline on this fix at the moment? It's a real pain not being able to use dynamic ports in service definitions.
Any updates on this?
I've noticed that dynamic port labels can be used without causing any errors (granted I still have errors, but I think they're unrelated). Is this expected now?
As of October 2023, the workaround documented here seems to enable usage of dynamic ports: https://discuss.hashicorp.com/t/port-mapping-with-nomad-and-consul-connect/16738/5