consul-k8s
consul-k8s copied to clipboard
K8S Prometheus deployed inside consul service mesh get connection refused on all outbound connections are a random amount of time
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
- Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
- If you are interested in working on this issue or have submitted a pull request, please leave a comment.
Overview of the Issue
I have deployed prometheus and grafana using the kube-prometheus-stack helm template. Consul is also deployed using the standard helm template. Connect inject is enabled but not set to default. The Prometheus pod is deployed inside the service mesh with transparent proxy enable. By default it connects to a number of pods and services, some inside the service mesh and some outside.
When starting Prometheus, everything works as expected, then after a random amount of time, all connection that go through the transparent proxy return a connection refused error.
Interestingly, if I exclude certain outbound ports, any connections over those ports work correctly without issue. I also had to exclude certain inbound ports for Prometheus to work.
Reproduction Steps
- When running helm install with the following
values.yml
:
global:
name: consul
metrics:
enabled: true
tls:
enabled: true
enableAutoEncrypt: true
verify: true
gossipEncryption:
secretName: consul-gossip-encryption-key
secretKey: key
federation:
enabled: true
createFederationSecret: true
acls:
manageSystemACLs: true
createReplicationToken: true
server:
replicas: 3
securityContext:
runAsNonRoot: false
runAsUser: 0
connectInject:
enabled: true
default: false
meshGateway:
enabled: true
syncCatalog:
enabled: true
default: true
toConsul: true
toK8S: true
syncClusterIPServices: false
- This is the relevant section of the
values.yml
for Prometheus:
prometheus:
prometheusSpec:
podMetadata:
annotations:
consul.hashicorp.com/connect-service: "prometheus-grafana-kube-pr-prometheus"
consul.hashicorp.com/transparent-proxy-exclude-outbound-ports: "9093"
consul.hashicorp.com/connect-inject: "true"
consul.hashicorp.com/connect-service: "prometheus-grafana-kube-pr-prometheus"
consul.hashicorp.com/transparent-proxy-exclude-inbound-ports: "8080,9090,10901,10902"
Logs
When viewing the targets in Prometheus, all endpoints that are accessed over the transparent proxy are down:
kubernetes-pods (0/19 up)
Each end point has the following error:
Get "http://10.X.X.X:XXX/metrics": dial tcp 10.X.X.X:XXX: connect: connection refused
Expected behavior
That prometheus connections do not degrade over time and continue to be accessible.
Environment details
Azure AKS latest version