nomad icon indicating copy to clipboard operation
nomad copied to clipboard

Consul connect services do not reconnect after booting up the cluster

Open suikast42 opened this issue 1 year ago • 0 comments
trafficstars

Operating system and Environment details

Nomad 1.6..0 CNI 1.6.0 Consul 1.20.0 CNI 1.6.0

##Job file

job "countdash_app_mesh" {
  datacenters = ["nomadder1"]
  group "api" {
    count = 1
#    constraint {
#      distinct_hosts = true
#    }
#         constraint {
#           attribute    = "${attr.unique.hostname}"
#           set_contains = "worker-02"
#         }
    network {
      mode = "bridge"
      port "api" {
        to = 9001
#        host_network = "public"
      }
    }

    service {
      name = "count-api"
      port = "api"
      address_mode = "alloc"
      connect {
        sidecar_service {}
      }

      check {
        name     = "api_health"
        type     = "http"
        path     = "/health"
        port     = "api"
        interval = "10s"
        timeout  = "2s"
        address_mode = "alloc"
      }

    }

    task "count-api" {
      driver = "docker"

      config {
        image = "hashicorpnomad/counter-api:v3"
        ports = ["api"]
      }

      resources {
        cpu    = 100
        memory = 128
      }
    }
  }

  group "dashboard" {
    count = 1
        # constraint {
        #   attribute    = "${attr.unique.hostname}"
        #   set_contains = "worker-01"
        # }
    network {
      mode = "bridge"

      port "http" {
        to = 9002
      }
    }

    service {
      name = "count-dashboard"
      port = "9002"
      tags = [
        "traefik.enable=true",
        "traefik.consulcatalog.connect=true",
        "traefik.http.routers.count-dashboard.tls=true",
        "traefik.http.routers.count-dashboard.rule=Host(`count.cloud.private`)"
      ]

      connect {
        sidecar_service {
          proxy {
            #            config {
            #              protocol = "http"
            #            }
            upstreams {
              destination_name = "count-api"
              local_bind_port  = 8080
            }
          }
        }
      }
    }

    task "dashboard" {
      driver = "docker"

      env {
        CONSUL_TLS_SERVER_NAME = "localhost"
        COUNTING_SERVICE_URL   = "http://${NOMAD_UPSTREAM_ADDR_count_api}"
      }

      config {
        image = "hashicorpnomad/counter-dashboard:v3"
      }

      resources {
        cpu    = 100
        memory = 128
      }
    }
  }
}

If I deploy this job everthing is ok until I rboot my vms.

After restart of the vms ( 1 worker and 1 master ) the connect services do not come up again

Log connect-dashboard


[2024-10-21 11:06:15.193][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:173] dns resolution without records for tempo-zipkin.service.consul
[2024-10-21 11:06:15.193][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:308] dns resolution for tempo-zipkin.service.consul completed with status 0
[2024-10-21 11:06:15.193][1][debug][upstream] [source/extensions/clusters/strict_dns/strict_dns_cluster.cc:201] DNS refresh rate reset for tempo-zipkin.service.consul, refresh rate 5000 ms
[2024-10-21 11:06:20.188][1][debug][main] [source/server/server.cc:237] flushing stats
[2024-10-21 11:06:20.193][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:391] dns resolution for tempo-zipkin.service.consul started
[2024-10-21 11:06:20.197][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:173] dns resolution without records for tempo-zipkin.service.consul
[2024-10-21 11:06:20.197][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:308] dns resolution for tempo-zipkin.service.consul completed with status 0
[2024-10-21 11:06:20.197][1][debug][upstream] [source/extensions/clusters/strict_dns/strict_dns_cluster.cc:201] DNS refresh rate reset for tempo-zipkin.service.consul, refresh rate 5000 ms
[2024-10-21 11:06:20.511][15][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:264] [Tags: "ConnectionId":"25"] new tcp proxy session
[2024-10-21 11:06:20.511][15][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:459] [Tags: "ConnectionId":"25"] Creating connection to cluster local_app
[2024-10-21 11:06:20.511][15][debug][misc] [source/common/upstream/cluster_manager_impl.cc:2329] Allocating TCP conn pool
[2024-10-21 11:06:20.511][15][debug][pool] [source/common/conn_pool/conn_pool_base.cc:291] trying to create new connection
[2024-10-21 11:06:20.511][15][debug][pool] [source/common/conn_pool/conn_pool_base.cc:145] creating a new connection (connecting=0)
[2024-10-21 11:06:20.511][15][debug][connection] [source/common/network/connection_impl.cc:1017] [Tags: "ConnectionId":"26"] connecting to 127.0.0.1:9002
[2024-10-21 11:06:20.511][15][debug][connection] [source/common/network/connection_impl.cc:1036] [Tags: "ConnectionId":"26"] connection in progress
[2024-10-21 11:06:20.511][15][debug][conn_handler] [source/common/listener_manager/active_tcp_listener.cc:160] [Tags: "ConnectionId":"25"] new connection from 172.21.2.20:34960
[2024-10-21 11:06:20.511][15][debug][connection] [source/common/network/connection_impl.cc:276] [Tags: "ConnectionId":"25"] closing socket: 0
[2024-10-21 11:06:20.511][15][debug][pool] [source/common/conn_pool/conn_pool_base.cc:669] cancelling pending stream
[2024-10-21 11:06:20.511][15][debug][connection] [source/common/network/connection_impl.cc:150] [Tags: "ConnectionId":"26"] closing data_to_write=0 type=1
[2024-10-21 11:06:20.511][15][debug][connection] [source/common/network/connection_impl.cc:276] [Tags: "ConnectionId":"26"] closing socket: 1
[2024-10-21 11:06:20.511][15][debug][pool] [source/common/conn_pool/conn_pool_base.cc:495] [Tags: "ConnectionId":"26"] client disconnected, failure reason: 
[2024-10-21 11:06:20.511][15][debug][pool] [source/common/conn_pool/conn_pool_base.cc:463] invoking 1 idle callback(s) - is_draining_for_deletion_=false
[2024-10-21 11:06:20.511][15][debug][pool] [source/common/conn_pool/conn_pool_base.cc:463] invoking 0 idle callback(s) - is_draining_for_deletion_=false
[2024-10-21 11:06:20.511][15][debug][conn_handler] [source/common/listener_manager/active_stream_listener_base.cc:136] [Tags: "ConnectionId":"25"] adding to cleanup list

Logs of same instace after restart consul

[2024-10-21 11:18:16.223][1][debug][connection] [source/common/tls/ssl_socket.cc:246] [Tags: "ConnectionId":"64"] remote address:alloc/tmp/consul_grpc.sock,TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.224][1][debug][client] [source/common/http/codec_client.cc:107] [Tags: "ConnectionId":"64"] disconnect. resetting 1 pending requests
[2024-10-21 11:18:16.224][1][debug][client] [source/common/http/codec_client.cc:159] [Tags: "ConnectionId":"64"] request reset
[2024-10-21 11:18:16.224][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:215] [Tags: "ConnectionId":"64"] destroying stream: 0 remaining
[2024-10-21 11:18:16.224][1][debug][router] [source/common/router/router.cc:1384] [Tags: "ConnectionId":"0","StreamId":"1431416887679520993"] upstream reset: reset reason: connection termination, transport failure reason: 
[2024-10-21 11:18:16.224][1][warning][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:188] DeltaAggregatedResources gRPC config stream to local_agent closed: 13, 
[2024-10-21 11:18:16.224][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment failed
[2024-10-21 11:18:16.224][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.listener.v3.Listener failed
[2024-10-21 11:18:16.224][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.cluster.v3.Cluster failed
[2024-10-21 11:18:16.224][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:495] [Tags: "ConnectionId":"64"] client disconnected, failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.224][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:463] invoking 1 idle callback(s) - is_draining_for_deletion_=false
[2024-10-21 11:18:16.425][1][debug][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:63] Establishing new gRPC bidi stream to local_agent for rpc DeltaAggregatedResources(stream .envoy.service.discovery.v3.DeltaDiscoveryRequest) returns (stream .envoy.service.discovery.v3.DeltaDiscoveryResponse);

[2024-10-21 11:18:16.425][1][debug][router] [source/common/router/router.cc:527] [Tags: "ConnectionId":"0","StreamId":"1952948337656148254"] cluster 'local_agent' match for URL '/envoy.service.discovery.v3.AggregatedDiscoveryService/DeltaAggregatedResources'
[2024-10-21 11:18:16.425][1][debug][router] [source/common/router/router.cc:756] [Tags: "ConnectionId":"0","StreamId":"1952948337656148254"] router decoding headers:
':method', 'POST'
':path', '/envoy.service.discovery.v3.AggregatedDiscoveryService/DeltaAggregatedResources'
':authority', 'local_agent'
':scheme', 'http'
'te', 'trailers'
'content-type', 'application/grpc'
'x-envoy-internal', 'true'
'x-forwarded-for', '172.26.64.100'

[2024-10-21 11:18:16.425][1][debug][pool] [source/common/http/conn_pool_base.cc:78] queueing stream due to no available connections (ready=0 busy=0 connecting=0)
[2024-10-21 11:18:16.425][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:291] trying to create new connection
[2024-10-21 11:18:16.425][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:145] creating a new connection (connecting=0)
[2024-10-21 11:18:16.425][1][debug][http2] [source/common/http/http2/codec_impl.cc:1695] [Tags: "ConnectionId":"109"] updating connection-level initial window size to 268435456
[2024-10-21 11:18:16.426][1][debug][connection] [./source/common/network/connection_impl.h:98] [Tags: "ConnectionId":"109"] current connecting state: true
[2024-10-21 11:18:16.426][1][debug][client] [source/common/http/codec_client.cc:57] [Tags: "ConnectionId":"109"] connecting
[2024-10-21 11:18:16.426][1][debug][connection] [source/common/network/connection_impl.cc:1017] [Tags: "ConnectionId":"109"] connecting to alloc/tmp/consul_grpc.sock
[2024-10-21 11:18:16.426][1][debug][connection] [source/common/network/connection_impl.cc:746] [Tags: "ConnectionId":"109"] connected
[2024-10-21 11:18:16.426][1][debug][misc] [source/common/network/io_socket_error_impl.cc:64] Unknown error code 32 details Broken pipe
[2024-10-21 11:18:16.426][1][debug][connection] [source/common/tls/ssl_socket.cc:246] [Tags: "ConnectionId":"109"] remote address:alloc/tmp/consul_grpc.sock,TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.426][1][debug][connection] [source/common/network/connection_impl.cc:276] [Tags: "ConnectionId":"109"] closing socket: 0
[2024-10-21 11:18:16.426][1][debug][client] [source/common/http/codec_client.cc:107] [Tags: "ConnectionId":"109"] disconnect. resetting 0 pending requests
[2024-10-21 11:18:16.426][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:495] [Tags: "ConnectionId":"109"] client disconnected, failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.426][1][debug][router] [source/common/router/router.cc:1384] [Tags: "ConnectionId":"0","StreamId":"1952948337656148254"] upstream reset: reset reason: remote connection failure, transport failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.426][1][debug][http] [source/common/http/async_client_impl.cc:182] async http request response headers (end_stream=true):
':status', '200'
'content-type', 'application/grpc'
'grpc-status', '14'
'grpc-message', 'upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end'

[2024-10-21 11:18:16.426][1][debug][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:195] DeltaAggregatedResources gRPC config stream to local_agent closed: 14, upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.426][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment failed
[2024-10-21 11:18:16.426][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.listener.v3.Listener failed
[2024-10-21 11:18:16.426][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.cluster.v3.Cluster failed
[2024-10-21 11:18:16.426][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:463] invoking 1 idle callback(s) - is_draining_for_deletion_=false
[2024-10-21 11:18:16.782][1][debug][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:63] Establishing new gRPC bidi stream to local_agent for rpc DeltaAggregatedResources(stream .envoy.service.discovery.v3.DeltaDiscoveryRequest) returns (stream .envoy.service.discovery.v3.DeltaDiscoveryResponse);

[2024-10-21 11:18:16.782][1][debug][router] [source/common/router/router.cc:527] [Tags: "ConnectionId":"0","StreamId":"13244006445772974244"] cluster 'local_agent' match for URL '/envoy.service.discovery.v3.AggregatedDiscoveryService/DeltaAggregatedResources'
[2024-10-21 11:18:16.782][1][debug][router] [source/common/router/router.cc:756] [Tags: "ConnectionId":"0","StreamId":"13244006445772974244"] router decoding headers:
':method', 'POST'
':path', '/envoy.service.discovery.v3.AggregatedDiscoveryService/DeltaAggregatedResources'
':authority', 'local_agent'
':scheme', 'http'
'te', 'trailers'
'content-type', 'application/grpc'
'x-envoy-internal', 'true'
'x-forwarded-for', '172.26.64.100'

[2024-10-21 11:18:16.782][1][debug][pool] [source/common/http/conn_pool_base.cc:78] queueing stream due to no available connections (ready=0 busy=0 connecting=0)
[2024-10-21 11:18:16.782][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:291] trying to create new connection
[2024-10-21 11:18:16.782][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:145] creating a new connection (connecting=0)
[2024-10-21 11:18:16.782][1][debug][http2] [source/common/http/http2/codec_impl.cc:1695] [Tags: "ConnectionId":"110"] updating connection-level initial window size to 268435456
[2024-10-21 11:18:16.782][1][debug][connection] [./source/common/network/connection_impl.h:98] [Tags: "ConnectionId":"110"] current connecting state: true
[2024-10-21 11:18:16.782][1][debug][client] [source/common/http/codec_client.cc:57] [Tags: "ConnectionId":"110"] connecting
[2024-10-21 11:18:16.782][1][debug][connection] [source/common/network/connection_impl.cc:1017] [Tags: "ConnectionId":"110"] connecting to alloc/tmp/consul_grpc.sock
[2024-10-21 11:18:16.783][1][debug][connection] [source/common/network/connection_impl.cc:746] [Tags: "ConnectionId":"110"] connected
[2024-10-21 11:18:16.783][1][debug][misc] [source/common/network/io_socket_error_impl.cc:64] Unknown error code 32 details Broken pipe
[2024-10-21 11:18:16.783][1][debug][connection] [source/common/tls/ssl_socket.cc:246] [Tags: "ConnectionId":"110"] remote address:alloc/tmp/consul_grpc.sock,TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.783][1][debug][connection] [source/common/network/connection_impl.cc:276] [Tags: "ConnectionId":"110"] closing socket: 0
[2024-10-21 11:18:16.783][1][debug][client] [source/common/http/codec_client.cc:107] [Tags: "ConnectionId":"110"] disconnect. resetting 0 pending requests
[2024-10-21 11:18:16.783][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:495] [Tags: "ConnectionId":"110"] client disconnected, failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.783][1][debug][router] [source/common/router/router.cc:1384] [Tags: "ConnectionId":"0","StreamId":"13244006445772974244"] upstream reset: reset reason: remote connection failure, transport failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.783][1][debug][http] [source/common/http/async_client_impl.cc:182] async http request response headers (end_stream=true):
':status', '200'
'content-type', 'application/grpc'
'grpc-status', '14'
'grpc-message', 'upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end'

[2024-10-21 11:18:16.783][1][debug][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:232] DeltaAggregatedResources gRPC config stream to local_agent closed: 14, upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.783][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment failed
[2024-10-21 11:18:16.783][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.listener.v3.Listener failed
[2024-10-21 11:18:16.783][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.cluster.v3.Cluster failed
[2024-10-21 11:18:16.783][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:463] invoking 1 idle callback(s) - is_draining_for_deletion_=false
[2024-10-21 11:18:18.409][1][debug][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:63] Establishing new gRPC bidi stream to local_agent for rpc DeltaAggregatedResources(stream .envoy.service.discovery.v3.DeltaDiscoveryRequest) returns (stream .envoy.service.discovery.v3.DeltaDiscoveryResponse);

[2024-10-21 11:18:18.409][1][debug][router] [source/common/router/router.cc:527] [Tags: "ConnectionId":"0","StreamId":"11959217123621313443"] cluster 'local_agent' match for URL '/envoy.service.discovery.v3.AggregatedDiscoveryService/DeltaAggregatedResources'
[2024-10-21 11:18:18.409][1][debug][router] [source/common/router/router.cc:756] [Tags: "ConnectionId":"0","StreamId":"11959217123621313443"] router decoding headers:
':method', 'POST'
':path', '/envoy.service.discovery.v3.AggregatedDiscoveryService/DeltaAggregatedResources'
':authority', 'local_agent'
':scheme', 'http'
'te', 'trailers'
'content-type', 'application/grpc'
'x-envoy-internal', 'true'
'x-forwarded-for', '172.26.64.100'

[2024-10-21 11:18:18.409][1][debug][pool] [source/common/http/conn_pool_base.cc:78] queueing stream due to no available connections (ready=0 busy=0 connecting=0)
[2024-10-21 11:18:18.410][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:291] trying to create new connection
[2024-10-21 11:18:18.410][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:145] creating a new connection (connecting=0)
[2024-10-21 11:18:18.410][1][debug][http2] [source/common/http/http2/codec_impl.cc:1695] [Tags: "ConnectionId":"111"] updating connection-level initial window size to 268435456
[2024-10-21 11:18:18.410][1][debug][connection] [./source/common/network/connection_impl.h:98] [Tags: "ConnectionId":"111"] current connecting state: true
[2024-10-21 11:18:18.410][1][debug][client] [source/common/http/codec_client.cc:57] [Tags: "ConnectionId":"111"] connecting
[2024-10-21 11:18:18.410][1][debug][connection] [source/common/network/connection_impl.cc:1017] [Tags: "ConnectionId":"111"] connecting to alloc/tmp/consul_grpc.sock
[2024-10-21 11:18:18.410][1][debug][connection] [source/common/network/connection_impl.cc:746] [Tags: "ConnectionId":"111"] connected
[2024-10-21 11:18:18.415][1][debug][client] [source/common/http/codec_client.cc:88] [Tags: "ConnectionId":"111"] connected
[2024-10-21 11:18:18.415][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:328] [Tags: "ConnectionId":"111"] attaching to next stream
[2024-10-21 11:18:18.415][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:182] [Tags: "ConnectionId":"111"] creating stream
[2024-10-21 11:18:18.415][1][debug][router] [source/common/router/upstream_request.cc:593] [Tags: "ConnectionId":"0","StreamId":"11959217123621313443"] pool ready

suikast42 avatar Oct 21 '24 11:10 suikast42