nomad icon indicating copy to clipboard operation
nomad copied to clipboard

Nomad Consul connect and Traefik

Open suikast42 opened this issue 3 years ago • 5 comments

If I deploy the whoami job as a standard job then erverything is fine with nomad and treafik.

I see my own certificate ( static no acme) on the web browser, But when I deploy the same container wuth consul connect side car then traefik tells me a Bad Gateway error.

As I can see from the treafik logging, trafik does his job.

time="2022-09-13T19:22:11Z" level=debug msg="Serving default certificate for request: "whoami.cloud.private"" time="2022-09-13T19:22:11Z" level=debug msg="vulcand/oxy/roundrobin/rr: begin ServeHttp on request" Request="{"Method":"GET","URL":{"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""},"Proto":"HTTP/2.0","ProtoMajor":2,"ProtoMinor":0,"Header":{"Accept":["/"],"User-Agent":["curl/7.81.0"],"X-Forwarded-Host":["whoami.cloud.private"],"X-Forwarded-Port":["443"],"X-Forwarded-Proto":["https"],"X-Forwarded-Server":["worker-01"],"X-Real-Ip":["10.21.21.41"]},"ContentLength":0,"TransferEncoding":null,"Host":"whoami.cloud.private","Form":null,"PostForm":null,"MultipartForm":null,"Trailer":null,"RemoteAddr":"10.21.21.41:48760","RequestURI":"/","TLS":null}" time="2022-09-13T19:22:11Z" level=debug msg="vulcand/oxy/roundrobin/rr: Forwarding this request to URL" Request="{"Method":"GET","URL":{"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""},"Proto":"HTTP/2.0","ProtoMajor":2,"ProtoMinor":0,"Header":{"Accept":["/"],"User-Agent":["curl/7.81.0"],"X-Forwarded-Host":["whoami.cloud.private"],"X-Forwarded-Port":["443"],"X-Forwarded-Proto":["https"],"X-Forwarded-Server":["worker-01"],"X-Real-Ip":["10.21.21.41"]},"ContentLength":0,"TransferEncoding":null,"Host":"whoami.cloud.private","Form":null,"PostForm":null,"MultipartForm":null,"Trailer":null,"RemoteAddr":"10.21.21.41:48760","RequestURI":"/","TLS":null}" ForwardURL="https://10.21.21.42:25797" time="2022-09-13T19:22:11Z" level=debug msg="'502 Bad Gateway' caused by: EOF" time="2022-09-13T19:22:11Z" level=debug msg="vulcand/oxy/roundrobin/rr: completed ServeHttp on request" Request="{"Method":"GET","URL":{"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""},"Proto":"HTTP/2.0","ProtoMajor":2,"ProtoMinor":0,"Header":{"Accept":["/"],"User-Agent":["curl/7.81.0"],"X-Forwarded-Host":["whoami.cloud.private"],"X-Forwarded-Port":["443"],"X-Forwarded-Proto":["https"],"X-Forwarded-Server":["worker-01"],"X-Real-Ip":["10.21.21.41"]},"ContentLength":0,"TransferEncoding":null,"Host":"whoami.cloud.private","Form":null,"PostForm":null,"MultipartForm":null,"Trailer":null,"RemoteAddr":"10.21.21.41:48760","RequestURI":"/","TLS":null}" time="2022-09-13T19:22:14Z" level=debug msg="Filtering disabled item" providerName=consulcatalog serviceName=consul time="2022-09-13T19:22:14Z" level=debug msg="Filtering disabled item" providerName=consulcatalog serviceName=nomad time="2022-09-13T19:22:14Z" level=debug msg="Filtering disabled item" providerName=consulcatalog serviceName=nomad-client time="2022-09-13T19:22:14Z" level=debug msg="Filtering disabled item" providerName=consulcatalog serviceName=traefik time="2022-09-13T19:22:14Z" level=debug msg="Configuration received: {"http":{"routers":{"whoami":{"service":"whoami","rule":"Host(whoami.cloud.private)","tls":{}}},"services":{"whoami":{"loadBalancer":{"servers":[{"url":"https://10.21.21.42:25797"},{"url":"https://10.21.21.42:24647"}],"passHostHeader":true,"serversTransport":"tls-default-nomadder1-whoami"}}},"serversTransports":{"tls-default-nomadder1-whoami":{"serverName":"default-nomadder1-whoami","insecureSkipVerify":true,"peerCertURI":"spiffe:///ns/default/dc/nomadder1/svc/whoami"}}},"tcp":{},"udp":{}}" providerName=consulcatalog time="2022-09-13T19:22:14Z" level=debug msg="Skipping unchanged configuration." providerName=consulcatalog

whoami.nomad

job "whoami" {
  datacenters = ["nomadder1"]

  group "whoami" {
    count = 2

    network {
      mode = "bridge"

      port "web" {}
    }

    service {
      name = "whoami"
      port = "web"

      connect {
        sidecar_service {}
      }

      tags = [
        "traefik.enable=true",
        "traefik.consulcatalog.connect=true",
		"traefik.http.routers.whoami.tls=true",
        "traefik.http.routers.whoami.rule=Host(`whoami.cloud.private`)",
      ]

      check {
        type     = "http"
        path     = "/health"
        port     = "web"
        interval = "10s"
        timeout  = "2s"
      }
    }

    task "whoami" {
      driver = "docker"

      config {
        image = "traefik/whoami"
        ports = ["web"]
        args  = ["--port", "${NOMAD_PORT_web}"]
      }

      resources {
        cpu    = 100
        memory = 128
      }
    }
  }
}

´´´


traefik.nomad
``` hcl 
job "traefik" {
  datacenters = ["nomadder1"]
  type        = "system"

  group "traefik" {
    network {
      port "web" {
        static = 80
      }

      port "websecure" {
        static = 443
      }
    }

    service {
      name = "traefik"
      port = "web"

      check {
        type     = "http"
        path     = "/ping"
        port     = "web"
        interval = "10s"
        timeout  = "2s"
      }
    }

    task "traefik" {
      driver = "docker"

      config {
        image        = "traefik:v2.8.4"
        network_mode = "host"

        volumes = [
          "local/traefik.yaml:/etc/traefik/traefik.yaml",
        ]
      }

      template {
        data = <<EOF
entryPoints:
  web:
    address: ":80"
  websecure:
    address: ":443"
  traefik:
    address: ":8081"
api:
  dashboard: true
  insecure: true
  debug: false
ping:
  entryPoint: "web"

log:
  level: "DEBUG"
providers:
  consulCatalog:
    prefix: "traefik"
    exposedByDefault: false
    endpoint:
      address: "127.0.0.1:8500"
      scheme: "http"
    connectAware: true
EOF

        destination = "local/traefik.yaml"
      }

      resources {
        cpu    = 100
        memory = 128
      }
    }
  }
}
´´´



### Nomad version
Nomad v1.3.5
### Consul version
Consul v1.13.1
### Trafik version
2.8.4



suikast42 avatar Sep 13 '22 19:09 suikast42

It seems so that something lags in acl permissions.

If I change the default_policy from deny to allow then it works even with full enabled tls.

The strange thing is that the used token is the global-management token.

global-management-token

acl = "write"
agent_prefix "" {
	policy = "write"
}
event_prefix "" {
	policy = "write"
}
key_prefix "" {
	policy = "write"
}
keyring = "write"
node_prefix "" {
	policy = "write"
}
operator = "write"
mesh = "write"
peering = "write"
query_prefix "" {
	policy = "write"
}
service_prefix "" {
	policy = "write"
	intentions = "write"
}
session_prefix "" {
	policy = "write"
}

 "acl": {
                "enabled": true,
                "default_policy": "allow",
                "enable_token_persistence": true,
                "tokens": {
                    "default": "e95b599e-166e-7d80-08ad-aee76e7ddf19",
                        "initial_management": "e95b599e-166e-7d80-08ad-aee76e7ddf19",
                        "agent": "e95b599e-166e-7d80-08ad-aee76e7ddf19"
                }
        }```



suikast42 avatar Sep 14 '22 08:09 suikast42

Hi @suikast42

Thanks for reporting this issue. I'll try to take a look at it this week. The update you provided the ACL config is really helpful.

DerekStrickland avatar Sep 14 '22 11:09 DerekStrickland

@suikast42 If setting the Consul ACL policy to default_policy: "allow" fixes the problem, that's likely a sign you are missing a Consul intention enabling a connection between the two Connect services.

https://www.consul.io/docs/connect/intentions

shoenig avatar Sep 16 '22 14:09 shoenig

@suikast42 If setting the Consul ACL policy to default_policy: "allow" fixes the problem, that's likely a sign you are missing a Consul intention enabling a connection between the two Connect services.

https://www.consul.io/docs/connect/intentions

I have this suspicion too.

By the way activate auto_config option.

With auto_config consul generates the ACL shown below:


service "whoami" {
	policy = "write"
}
service "whoami-sidecar-proxy" {
	policy = "write"
}
service_prefix "" {
	policy = "read"
}
node_prefix "" {
	policy = "read"
}
´´´

And indeed I see nowehere an acl belongs to itentions.

suikast42 avatar Sep 17 '22 11:09 suikast42

@suikast42 I'm not a Consul Connect expert but it doesn't look to me like auto_config sets up intentions at all. Can you try setting up intentions appropriately as @shoenig has recommended, and then reporting back?

tgross avatar Oct 04 '22 18:10 tgross

We haven't heard back on whether setting the intentions helped, so I'm going to close this issue out. Please feel free to add additional information if you have it. Thanks!

tgross avatar Nov 30 '22 21:11 tgross

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Mar 31 '23 02:03 github-actions[bot]