linkerd2
linkerd2 copied to clipboard
FailedDiscoveryCheck for tap API service (Kubernetes v1.24.2 via kind, Linkerd & extensions via Helm charts, separate Prometheus & Grafana)
What is the issue?
I have a working Linkerd installation in a kind-based Kubernetes v1.24.2 cluster. All Linkerd components are installed using the Helm chart via Argo CD. I have my own (functional) Prometheus, Grafana, and Jaeger, so I disable the builtin Linkerd versions and point at the URLs for my instances. The tap APIService resource shows this error:
FailedDiscoveryCheck: failing or missing response from https://10.96.74.20:443/apis/tap.linkerd.io/v1alpha1: Get "https://10.96.74.20:443/apis/tap.linkerd.io/v1alpha1": context deadline exceeded
And the output of linkerd viz check is:
linkerd-viz
-----------
√ linkerd-viz Namespace exists
√ linkerd-viz ClusterRoles exist
√ linkerd-viz ClusterRoleBindings exist
√ tap API server has valid cert
√ tap API server cert is valid for at least 60 days
‼ tap API service is running
FailedDiscoveryCheck: failing or missing response from https://10.96.74.20:443/apis/tap.linkerd.io/v1alpha1: Get "https://10.96.74.20:443/apis/tap.linkerd.io/v1alpha1": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
see https://linkerd.io/2.11/checks/#l5d-tap-api for hints
√ linkerd-viz pods are injected
√ viz extension pods are running
‼ viz extension proxies are healthy
Some pods do not have the current trust bundle and must be restarted:
* metrics-api-8579f86cfb-g8bt6
* tap-696f788ffc-fbprx
* tap-injector-5b5494fb7d-5g562
* web-848fb9d444-wm6lb
see https://linkerd.io/2.11/checks/#l5d-viz-proxy-healthy for hints
√ viz extension proxies are up-to-date
‼ viz extension proxies and cli versions match
metrics-api-8579f86cfb-g8bt6 running edge-22.6.2 but cli running stable-2.11.2
see https://linkerd.io/2.11/checks/#l5d-viz-proxy-cli-version for hints
‼ prometheus is installed and configured correctly
missing ClusterRoles: linkerd-linkerd-viz-prometheus
see https://linkerd.io/2.11/checks/#l5d-viz-prometheus for hints
√ can initialize the client
E0704 16:13:51.970794 5600 portforward.go:400] an error occurred forwarding 56957 -> 8085: error forwarding port 8085 to pod 2c71231c61ea4f0ab27cb28f35e4fde1d8a7f40e65ceda2306e15cdeeef1d11b, uid : failed to execute portforward in network namespace "/var/run/netns/cni-5f224b67-731d-680a-6d78-71ee87f34dfc": read tcp4 127.0.0.1:35230->127.0.0.1:8085: read: connection reset by peer
× viz extension self-check
Post "http://localhost:56957/api/v1/SelfCheck": net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x00\x00\x06\x04\x00\x00\x00\x00\x00\x00\x05\x00\x00@\x00"
see https://linkerd.io/2.11/checks/#l5d-viz-metrics-api for hints
I’m not sure what the issue with the trust bundle. Maybe that’s happening because the initialization process is stalled. I did try deriving it from the same Issuer I use to issue the fixed Linkerd trust roots, but that turned out to be complicated because of the different namespaces and I wasn’t able to find a solution. Since I didn’t know whether it was related in the first place, I abandoned the experiment.
The fix mentioned in https://github.com/linkerd/linkerd2/issues/7233#issuecomment-964478711 appears to already be in the applied policy, so I guess my issue isn’t the same as #7301.
How can it be reproduced?
-
Create a new cluster.
-
Install cert-manager and create the linkerd-identity-issuer Certificate.
-
Install Linkerd (linkerd-crds 1.1.1-edge, linkerd-control-plane 1.5.3-edge) with
identity.externalCAset tofalse. -
Install Prometheus, Grafana, and Jaeger.
-
Install linkerd-jaeger 30.3.5-edge with
jaeger.enabledset tofalseandexporters.jaeger.endpointinsidecollector.configset to the appropriate value. -
Install linkerd-viz 30.2.5-edge with these settings (and the appropriate URLs):
grafana: enabled: false prometheus: enabled: false prometheusUrl: "FIXME" grafanaUrl: "FIXME" jaegerUrl: "FIXME"
Logs, error output, etc
(see above)
output of linkerd check -o short
Linkerd core checks
===================
linkerd-identity
----------------
‼ issuer cert is valid for at least 60 days
issuer certificate will expire on 2022-07-07T10:40:04Z
see https://linkerd.io/2.11/checks/#l5d-identity-issuer-cert-not-expiring-soon for hints
control-plane-version
---------------------
‼ control plane and cli versions match
control plane running edge-22.6.2 but cli running stable-2.11.2
see https://linkerd.io/2.11/checks/#l5d-version-control for hints
linkerd-control-plane-proxy
---------------------------
‼ control plane proxies and cli versions match
linkerd-destination-5bc696b4b7-wpf6w running edge-22.6.2 but cli running stable-2.11.2
see https://linkerd.io/2.11/checks/#l5d-cp-proxy-cli-version for hints
Linkerd extensions checks
=========================
linkerd-jaeger
--------------
‼ jaeger extension proxies are healthy
Some pods do not have the current trust bundle and must be restarted:
* collector-75bf4b457b-4f8cw
* jaeger-injector-db64ddbc7-h49d8
see https://linkerd.io/2.11/checks/#l5d-jaeger-proxy-healthy for hints
‼ jaeger extension proxies and cli versions match
collector-75bf4b457b-4f8cw running edge-22.6.2 but cli running stable-2.11.2
see https://linkerd.io/2.11/checks/#l5d-jaeger-proxy-cli-version for hints
Linkerd extensions checks
=========================
linkerd-viz
-----------
‼ tap API service is running
FailedDiscoveryCheck: failing or missing response from https://10.96.95.81:443/apis/tap.linkerd.io/v1alpha1: Get "https://10.96.95.81:443/apis/tap.linkerd.io/v1alpha1": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
see https://linkerd.io/2.11/checks/#l5d-tap-api for hints
‼ viz extension proxies are healthy
Some pods do not have the current trust bundle and must be restarted:
* metrics-api-8579f86cfb-9ntkq
* tap-6f6c7556c6-kv4k6
* tap-injector-545864fc8-wh7gz
* web-5fc7fccd74-c5bml
see https://linkerd.io/2.11/checks/#l5d-viz-proxy-healthy for hints
‼ viz extension proxies and cli versions match
metrics-api-8579f86cfb-9ntkq running edge-22.6.2 but cli running stable-2.11.2
see https://linkerd.io/2.11/checks/#l5d-viz-proxy-cli-version for hints
‼ prometheus is installed and configured correctly
missing ClusterRoles: linkerd-linkerd-viz-prometheus
see https://linkerd.io/2.11/checks/#l5d-viz-prometheus for hints
× viz extension self-check
Post "http://localhost:50155/api/v1/SelfCheck": net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x00\x00\x06\x04\x00\x00\x00\x00\x00\x00\x05\x00\x00@\x00"
see https://linkerd.io/2.11/checks/#l5d-viz-metrics-api for hints
Status check results are ×
Environment
Kubernetes v1.24.2 kind v0.14.0 Windows 10 (+ WSL2) Linkerd v1.1.1-edge (CRDs)/v1.5.3-edge (control plane) Argo CD v2.4.3
Possible solution
No response
Additional context
No response
Would you like to work on fixing this bug?
No response