linkerd2
linkerd2 copied to clipboard
Policy controller noisy connection logging
What is the issue?
The policy controller fails when it tries to watch some k8s resources (I think all of them). There is not a single package dropped according to cilium (I used hubble to check this) but the controller says the connection was dropped. Using curl within the container I can make the same GET request to the API server and get a response, so the CNI is not dropping this connection.
$ curl -v --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://10.0.0.1/apis/policy.linkerd.io/v1beta3/httproutes?&wat
ch=true&timeoutSeconds=290&allowWatchBookmarks=true&resourceVersion=228616531
* Request completely sent off
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
< HTTP/2 200
< audit-id: 568e2098-a3a2-4216-9503-3a8b7cdb1050
< cache-control: no-cache, private
< content-type: application/json
< x-kubernetes-pf-flowschema-uid: f2596b47-c5e3-439f-a6e7-a84503cb1560
< x-kubernetes-pf-prioritylevel-uid: c65e2c9b-924d-4ec0-848a-49025cdd9c73
< content-length: 165
< date: Wed, 19 Jun 2024 22:33:07 GMT
<
{
"apiVersion": "policy.linkerd.io/v1beta3",
"items": [],
"kind": "HTTPRouteList",
"metadata": {
"continue": "",
"resourceVersion": "226670468"
}
* Connection #0 to host 10.0.0.1 left intact
}
How can it be reproduced?
Using terraform:
resource "helm_release" "cilium" {
name = "cilium"
repository = "https://helm.cilium.io/"
chart = "cilium"
version = var.cilium_version
namespace = "kube-system"
set {
name = "aksbyocni.enabled"
value = true
}
set {
name = "nodeinit.enabled"
value = true
}
set {
name = "hubble.relay.enabled"
value = true
}
set {
name = "hubble.ui.enabled"
value = true
}
}
resource "kubernetes_namespace" "linkerd_namespace" {
metadata {
name = "linkerd"
}
}
resource "kubernetes_namespace" "linkerd_viz_namespace" {
metadata {
name = "linkerd-viz"
labels = {
"linkerd.io/extension" = "viz"
}
}
}
resource "kubernetes_namespace" "linkerd_cni_namespace" {
metadata {
name = "linkerd-cni"
labels = {
"config.linkerd.io/admission-webhooks" = "disabled"
"linkerd.io/cni-resource" = "true"
}
}
}
resource "helm_release" "linkerd_cni" {
depends_on = [kubernetes_namespace.linkerd_namespace]
name = "linkerd-cni"
repository = "https://helm.linkerd.io/edge"
chart = "linkerd2-cni"
namespace = "linkerd-cni"
version = "2024.3.1"
wait = true
}
resource "helm_release" "linkerd_crds" {
depends_on = [kubernetes_namespace.linkerd_namespace, helm_release.linkerd_cni]
name = "linkerd-crds"
repository = "https://helm.linkerd.io/edge"
chart = "linkerd-crds"
namespace = "linkerd"
version = "2024.3.1"
wait = true
set {
name = "cniEnabled"
value = true
}
}
resource "helm_release" "linkerd" {
depends_on = [helm_release.linkerd_crds]
name = "linkerd-control-plane"
repository = "https://helm.linkerd.io/edge"
chart = "linkerd-control-plane"
namespace = "linkerd"
version = "2024.3.1"
wait = true
values = [
"${file("${path.module}/values-ha.yaml")}"
]
set {
name = "controllerReplicas"
value = var.replicas
}
set {
name = "cniEnabled"
value = true
}
set {
name = "identity.externalCA"
value = true
}
set {
name = "identity.issuer.scheme"
value = "kubernetes.io/tls"
}
set {
name = "disableHeartBeat"
value = true
}
set {
name = "webhookFailurePolicy"
value = "Fail"
}
set {
name = "networkValidator.connectAddr"
value = "0.0.0.0:4140"
}
}
resource "helm_release" "linker-viz" {
depends_on = [helm_release.linkerd_crds, module.linkerd_viz_policy]
name = "linkerd-viz"
repository = "https://helm.linkerd.io/edge"
chart = "linkerd-viz"
namespace = "linkerd-viz"
version = "2024.6.1"
wait = true
}
values-ha.yaml
# This values.yaml file contains the values needed to enable HA mode.
# Usage:
# helm install -f values-ha.yaml
# -- Create PodDisruptionBudget resources for each control plane workload
enablePodDisruptionBudget: true
controller:
# -- sets pod disruption budget parameter for all deployments
podDisruptionBudget:
# -- Maximum number of pods that can be unavailable during disruption
maxUnavailable: 1
# -- Specify a deployment strategy for each control plane workload
deploymentStrategy:
rollingUpdate:
maxUnavailable: 1
maxSurge: 25%
# -- add PodAntiAffinity to each control plane workload
enablePodAntiAffinity: true
# nodeAffinity:
# proxy configuration
proxy:
resources:
cpu:
request: 100m
memory:
limit: 250Mi
request: 20Mi
# controller configuration
controllerReplicas: 3
controllerResources: &controller_resources
cpu: &controller_resources_cpu
limit: ""
request: 100m
memory:
limit: 250Mi
request: 50Mi
destinationResources: *controller_resources
# identity configuration
identityResources:
cpu: *controller_resources_cpu
memory:
limit: 250Mi
request: 10Mi
# heartbeat configuration
heartbeatResources: *controller_resources
# proxy injector configuration
proxyInjectorResources: *controller_resources
webhookFailurePolicy: Fail
# service profile validator configuration
spValidatorResources: *controller_resources
# flag for linkerd check
highAvailability: true
Logs, error output, etc
Every few minutes, the policy controller logs:
2024-06-19T22:29:04.143485Z INFO networkauthentications: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2024-06-19T22:29:04.367419Z INFO meshtlsauthentications: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2024-06-19T22:29:04.586154Z INFO authorizationpolicies: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2024-06-19T22:29:05.138944Z INFO servers: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2024-06-19T22:29:05.401101Z INFO httproutes.gateway.networking.k8s.io: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2024-06-19T22:29:06.185944Z INFO httproutes.policy.linkerd.io: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2024-06-19T22:29:06.443435Z INFO services: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
Running it with a debug log level, I can see this:
2024-06-14T23:01:36.098797Z DEBUG Connection{peer=Server}: h2::codec::framed_read: received frame=Headers { stream_id: StreamId(37), flags: (0x4: END_HEADERS) }
2024-06-14T23:01:36.098872Z DEBUG Connection{peer=Server}: h2::codec::framed_read: received frame=Data { stream_id: StreamId(37) }
2024-06-14T23:01:36.098892Z DEBUG Connection{peer=Server}: h2::codec::framed_read: received frame=Data { stream_id: StreamId(37), flags: (0x1: END_STREAM) }
2024-06-14T23:01:36.099070Z DEBUG Connection{peer=Server}: h2::codec::framed_write: send frame=Headers { stream_id: StreamId(37), flags: (0x5: END_HEADERS | END_STREAM) }
2024-06-14T23:01:36.840186Z DEBUG status::Index: linkerd_policy_controller_k8s_status::index: Lease holder reconciling cluster index.name=linkerd-destination-bcdf65946-6spv7
2024-06-14T23:01:39.458677Z DEBUG hyper::proto::h1::conn: incoming body decode error: Connection reset by peer (os error 104)
2024-06-14T23:01:39.458754Z DEBUG rustls::common_state: Sending warning alert CloseNotify
2024-06-14T23:01:39.458781Z DEBUG hyper::proto::h1::conn: error shutting down IO: Broken pipe (os error 32)
2024-06-14T23:01:39.458903Z DEBUG hyper::client::client: client connection error: error shutting down connection: Broken pipe (os error 32)
2024-06-14T23:01:39.458914Z DEBUG servers: kube_runtime::watcher: watcher error: ReadEvents(Custom { kind: Other, error: hyper::Error(Body, hyper::Error(Body, Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" })) })
2024-06-14T23:01:39.458950Z INFO servers: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2024-06-14T23:01:39.459034Z DEBUG servers: tower::buffer::worker: service.ready=true processing request
2024-06-14T23:01:39.459097Z DEBUG servers:HTTP{http.method=GET http.url=https://10.0.0.1/apis/policy.linkerd.io/v1beta2/servers?&watch=true&timeoutSeconds=290&allowWatchBookmarks=true&resourceVersion=228616531 otel.name="watch" otel.kind="client"}: kube_client::client::builder: requesting
2024-06-14T23:01:39.459151Z DEBUG servers:HTTP{http.method=GET http.url=https://10.0.0.1/apis/policy.linkerd.io/v1beta2/servers?&watch=true&timeoutSeconds=290&allowWatchBookmarks=true&resourceVersion=228616531 otel.name="watch" otel.kind="client"}: hyper::client::pool: reuse idle connection for ("https", 10.0.0.1)
2024-06-14T23:01:39.459450Z DEBUG hyper::proto::h1::io: flushed 1953 bytes
2024-06-14T23:01:39.460952Z DEBUG hyper::proto::h1::io: parsed 7 headers
2024-06-14T23:01:39.460968Z DEBUG hyper::proto::h1::conn: incoming body is chunked encoding
2024-06-14T23:01:39.825269Z DEBUG hyper::proto::h1::conn: incoming body decode error: Connection reset by peer (os error 104)
2024-06-14T23:01:39.825333Z DEBUG rustls::common_state: Sending warning alert CloseNotify
2024-06-14T23:01:39.825365Z DEBUG hyper::proto::h1::conn: error shutting down IO: Broken pipe (os error 32)
2024-06-14T23:01:39.825475Z DEBUG hyper::client::client: client connection error: error shutting down connection: Broken pipe (os error 32)
2024-06-14T23:01:39.827989Z DEBUG httproutes.policy.linkerd.io: kube_runtime::watcher: watcher error: ReadEvents(Custom { kind: Other, error: hyper::Error(Body, hyper::Error(Body, Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" })) })
2024-06-14T23:01:39.828036Z INFO httproutes.policy.linkerd.io: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2024-06-14T23:01:39.828120Z DEBUG httproutes.policy.linkerd.io: tower::buffer::worker: service.ready=true processing request
2024-06-14T23:01:39.828184Z DEBUG httproutes.policy.linkerd.io:HTTP{http.method=GET http.url=https://10.0.0.1/apis/policy.linkerd.io/v1beta3/httproutes?&watch=true&timeoutSeconds=290&allowWatchBookmarks=true&resourceVersion=228616531 otel.name="watch" otel.kind="client"}: kube_client::client::builder: requesting
2024-06-14T23:01:39.828270Z DEBUG httproutes.policy.linkerd.io:HTTP{http.method=GET http.url=https://10.0.0.1/apis/policy.linkerd.io/v1beta3/httproutes?&watch=true&timeoutSeconds=290&allowWatchBookmarks=true&resourceVersion=228616531 otel.name="watch" otel.kind="client"}: hyper::client::connect::http: connecting to 10.0.0.1:443
2024-06-14T23:01:39.829996Z DEBUG httproutes.policy.linkerd.io:HTTP{http.method=GET http.url=https://10.0.0.1/apis/policy.linkerd.io/v1beta3/httproutes?&watch=true&timeoutSeconds=290&allowWatchBookmarks=true&resourceVersion=228616531 otel.name="watch" otel.kind="client"}: hyper::client::connect::http: connected to 10.0.0.1:443
2024-06-14T23:01:39.830113Z DEBUG httproutes.policy.linkerd.io:HTTP{http.method=GET http.url=https://10.0.0.1/apis/policy.linkerd.io/v1beta3/httproutes?&watch=true&timeoutSeconds=290&allowWatchBookmarks=true&resourceVersion=228616531 otel.name="watch" otel.kind="client"}: rustls::client::hs: Resuming session
2024-06-14T23:01:39.845683Z DEBUG httproutes.policy.linkerd.io:HTTP{http.method=GET http.url=https://10.0.0.1/apis/policy.linkerd.io/v1beta3/httproutes?&watch=true&timeoutSeconds=290&allowWatchBookmarks=true&resourceVersion=228616531 otel.name="watch" otel.kind="client"}: rustls::client::hs: Using ciphersuite TLS13_AES_128_GCM_SHA256
2024-06-14T23:01:39.845723Z DEBUG httproutes.policy.linkerd.io:HTTP{http.method=GET http.url=https://10.0.0.1/apis/policy.linkerd.io/v1beta3/httproutes?&watch=true&timeoutSeconds=290&allowWatchBookmarks=true&resourceVersion=228616531 otel.name="watch" otel.kind="client"}: rustls::client::tls13: Not resuming
2024-06-14T23:01:39.845844Z DEBUG httproutes.policy.linkerd.io:HTTP{http.method=GET http.url=https://10.0.0.1/apis/policy.linkerd.io/v1beta3/httproutes?&watch=true&timeoutSeconds=290&allowWatchBookmarks=true&resourceVersion=228616531 otel.name="watch" otel.kind="client"}: rustls::client::tls13: TLS1.3 encrypted extensions: []
2024-06-14T23:01:39.845855Z DEBUG httproutes.policy.linkerd.io:HTTP{http.method=GET http.url=https://10.0.0.1/apis/policy.linkerd.io/v1beta3/httproutes?&watch=true&timeoutSeconds=290&allowWatchBookmarks=true&resourceVersion=228616531 otel.name="watch" otel.kind="client"}: rustls::client::hs: ALPN protocol is None
2024-06-14T23:01:39.845877Z DEBUG httproutes.policy.linkerd.io:HTTP{http.method=GET http.url=https://10.0.0.1/apis/policy.linkerd.io/v1beta3/httproutes?&watch=true&timeoutSeconds=290&allowWatchBookmarks=true&resourceVersion=228616531 otel.name="watch" otel.kind="client"}: rustls::client::tls13: Got CertificateRequest CertificateRequestPayloadTLS13 { context: , extensions: [Unknown(UnknownExtension { typ: StatusRequest, payload: }), Unknown(UnknownExtension { typ: SCT, payload: }), SignatureAlgorithms([RSA_PSS_SHA256, ECDSA_NISTP256_SHA256, ED25519, RSA_PSS_SHA384, RSA_PSS_SHA512, RSA_PKCS1_SHA256, RSA_PKCS1_SHA384, RSA_PKCS1_SHA512, ECDSA_NISTP384_SHA384, ECDSA_NISTP521_SHA512, RSA_PKCS1_SHA1, ECDSA_SHA1_Legacy]), AuthorityNames([DistinguishedName(300d310b3009060355040313026361), DistinguishedName(3011310f300d060355040313066167672d6361)])] }
2024-06-14T23:01:39.845932Z DEBUG httproutes.policy.linkerd.io:HTTP{http.method=GET http.url=https://10.0.0.1/apis/policy.linkerd.io/v1beta3/httproutes?&watch=true&timeoutSeconds=290&allowWatchBookmarks=true&resourceVersion=228616531 otel.name="watch" otel.kind="client"}: rustls::client::common: Client auth requested but no cert/sigscheme available
2024-06-14T23:01:39.846540Z DEBUG hyper::proto::h1::io: flushed 1956 bytes
2024-06-14T23:01:39.848708Z DEBUG hyper::proto::h1::io: parsed 7 headers
2024-06-14T23:01:39.848724Z DEBUG hyper::proto::h1::conn: incoming body is chunked encoding
2024-06-14T23:01:40.045298Z DEBUG hyper::proto::h1::conn: read eof
2024-06-14T23:01:41.039244Z DEBUG hyper::proto::h1::role: setting h1 header read timeout timer
2024-06-14T23:01:41.039253Z DEBUG hyper::proto::h1::role: setting h1 header read timeout timer
2024-06-14T23:01:41.039320Z DEBUG hyper::proto::h1::io: parsed 3 headers
2024-06-14T23:01:41.039329Z DEBUG hyper::proto::h1::conn: incoming body is empty
2024-06-14T23:01:41.039357Z DEBUG hyper::proto::h1::io: parsed 3 headers
2024-06-14T23:01:41.039372Z DEBUG hyper::proto::h1::conn: incoming body is empty
2024-06-14T23:01:41.039442Z DEBUG hyper::proto::h1::io: flushed 107 bytes
2024-06-14T23:01:41.040019Z DEBUG hyper::proto::h1::io: flushed 107 bytes
2024-06-14T23:01:41.752835Z DEBUG hyper::proto::h1::conn: incoming body decode error: Connection reset by peer (os error 104)
2024-06-14T23:01:41.752898Z DEBUG rustls::common_state: Sending warning alert CloseNotify
2024-06-14T23:01:41.752927Z DEBUG hyper::proto::h1::conn: error shutting down IO: Broken pipe (os error 32)
output of linkerd check -o short
----------------
‼ trust anchors are valid for at least 60 days
Anchors expiring soon:
* 329674952538096284210311201227367316359 root.linkerd.cluster.local will expire on 2024-06-19T23:29:52Z
see https://linkerd.io/2.13/checks/#l5d-identity-trustAnchors-not-expiring-soon for hints
‼ issuer cert is valid for at least 60 days
issuer certificate will expire on 2024-06-19T23:16:07Z
see https://linkerd.io/2.13/checks/#l5d-identity-issuer-cert-not-expiring-soon for hints
linkerd-version
---------------
‼ cli is up-to-date
unsupported version channel: stable-2.13.6
see https://linkerd.io/2.13/checks/#l5d-version-cli for hints
control-plane-version
---------------------
‼ control plane is up-to-date
is running version 24.3.1 but the latest edge version is 24.6.2
see https://linkerd.io/2.13/checks/#l5d-version-control for hints
‼ control plane and cli versions match
control plane running edge-24.3.1 but cli running stable-2.13.6
see https://linkerd.io/2.13/checks/#l5d-version-control for hints
linkerd-control-plane-proxy
---------------------------
‼ control plane proxies are up-to-date
some proxies are not running the current version:
* linkerd-destination-669ccf4f8-rx85b (edge-24.3.1)
* linkerd-destination-669ccf4f8-ssqd2 (edge-24.3.1)
* linkerd-identity-7f6f95b45c-sndb2 (edge-24.3.1)
* linkerd-identity-7f6f95b45c-zhcck (edge-24.3.1)
* linkerd-proxy-injector-6466cbdf79-9ftwx (edge-24.3.1)
* linkerd-proxy-injector-6466cbdf79-n5gcs (edge-24.3.1)
see https://linkerd.io/2.13/checks/#l5d-cp-proxy-version for hints
‼ control plane proxies and cli versions match
linkerd-destination-669ccf4f8-rx85b running edge-24.3.1 but cli running stable-2.13.6
see https://linkerd.io/2.13/checks/#l5d-cp-proxy-cli-version for hints
linkerd-ha-checks
-----------------
‼ pod injection disabled on kube-system
kube-system namespace needs to have the label config.linkerd.io/admission-webhooks: disabled if injector webhook failure policy is Fail
see https://linkerd.io/2.13/checks/#l5d-injection-disabled for hints
linkerd-viz
-----------
‼ viz extension proxies are healthy
Some pods do not have the current trust bundle and must be restarted:
* metrics-api-5f994ffcd4-gt62w
* prometheus-54f9848dd8-pllbj
* tap-c745745f5-bptpg
* tap-injector-5b5898c87-j5tl7
* web-7c948c6b48-hzsz4
see https://linkerd.io/2.13/checks/#l5d-viz-proxy-healthy for hints
‼ viz extension proxies are up-to-date
some proxies are not running the current version:
* metrics-api-5f994ffcd4-gt62w (edge-24.3.1)
* prometheus-54f9848dd8-pllbj (edge-24.3.1)
* tap-c745745f5-bptpg (edge-24.3.1)
* tap-injector-5b5898c87-j5tl7 (edge-24.3.1)
* web-7c948c6b48-hzsz4 (edge-24.3.1)
see https://linkerd.io/2.13/checks/#l5d-viz-proxy-cp-version for hints
‼ viz extension proxies and cli versions match
metrics-api-5f994ffcd4-gt62w running edge-24.3.1 but cli running stable-2.13.6
see https://linkerd.io/2.13/checks/#l5d-viz-proxy-cli-version for hints
Environment
- Cilium version: 1.13.17
- Linkerd version: edge-24.3.1
- Kubernetes version: 1.28.3
- Cloud environment: Azure (AKS) with BYO network plugin
Possible solution
No response
Additional context
I modified the policy controller container image by adding: ls
, wget
, sh
and curl
Would you like to work on fixing this bug?
None