linkerd2 icon indicating copy to clipboard operation
linkerd2 copied to clipboard

linkerd viz check & linkerd viz dashboard fail

Open shadiramadan opened this issue 2 years ago • 4 comments

What is the issue?

linkerd viz check and linkerd viz dashboard both fail.

I am able to access the dashboard without a problem via:

kubectl port-forward -n linkerd-viz service/web 8084:8084

How can it be reproduced?

I am running a Private GKE Cluster (no public IP).

In order to connect to this cluster I connect to it using GCP's ssh tunnel through IAP feature with a bastion. https://cloud.google.com/kubernetes-engine/docs/tutorials/private-cluster-bastion#connect

As using the gcloud container clusters get-credentials command adds a private ip to the kube conf I need to set a proxy. What I do is use the kube conf proxy-url feature so that I can support multiple clusters with kubectl natively instead of constantly setting HTTPS_PROXY (using HTTPS_PROXY does not resolve this either)

https://kubernetes.io/docs/reference/config-api/_print/#client-authentication-k8s-io-v1-Cluster


My hunch is that whatever linkerd viz is doing to try to launch / connect to the dashboard is not respecting / interfering with the proxy.

Logs, error output, etc

➜  ~ linkerd viz dashboard --verbose
DEBU[0006] Starting port forward to https://10.8.16.2/api/v1/namespaces/linkerd-viz/pods/metrics-api-569cd6f764-rwphk/portforward?timeout=30s 50927:8085
DEBU[0006] Port forward initialised
DEBU[0006] Expecting API to be served over [http://localhost:50927/api/v1/]
DEBU[0006] Making gRPC-over-HTTP call to [http://localhost:50927/api/v1/SelfCheck] []
DEBU[0006] Response from [http://localhost:50927/api/v1/SelfCheck] had headers: map[Connection:[close] Content-Type:[text/html] Server:[tinyproxy/1.10.0]]
DEBU[0006] gRPC-over-HTTP call returned status [500 Unable to connect] and content length [-1]
DEBU[0006] Retrying on error: HTTP error, status Code [500] (unexpected API response: <?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

<head>
<title>500 Unable to connect</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>

<body>

<h1>Unable to connect</h1>

<p>Tinyproxy was unable to connect to the remote web server.</p>

<hr />

<p><em>Generated by <a href="https://tinyproxy.github.io/">tinyproxy</a> version 1.10.0.</em></p>

</body>

</html>
)
Waiting for linkerd-viz extension to become available
^C
➜  ~ linkerd viz check --verbose
linkerd-viz
-----------
√ linkerd-viz Namespace exists
√ linkerd-viz ClusterRoles exist
√ linkerd-viz ClusterRoleBindings exist
√ tap API server has valid cert
√ tap API server cert is valid for at least 60 days
√ tap API service is running
√ linkerd-viz pods are injected
√ viz extension pods are running
√ viz extension proxies are healthy
√ viz extension proxies are up-to-date
√ viz extension proxies and cli versions match
√ prometheus is installed and configured correctly
DEBU[0004] Starting port forward to https://10.8.16.2/api/v1/namespaces/linkerd-viz/pods/metrics-api-569cd6f764-rwphk/portforward?timeout=30s 65226:8085
DEBU[0005] Port forward initialised
DEBU[0005] Expecting API to be served over [http://localhost:65226/api/v1/]
√ can initialize the client
DEBU[0005] Making gRPC-over-HTTP call to [http://localhost:65226/api/v1/SelfCheck] []
DEBU[0005] Response from [http://localhost:65226/api/v1/SelfCheck] had headers: map[Connection:[close] Content-Type:[text/html] Server:[tinyproxy/1.10.0]]
DEBU[0005] gRPC-over-HTTP call returned status [500 Unable to connect] and content length [-1]
DEBU[0005] Retrying on error: HTTP error, status Code [500] (unexpected API response: <?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

<head>
<title>500 Unable to connect</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>

<body>

<h1>Unable to connect</h1>

<p>Tinyproxy was unable to connect to the remote web server.</p>

<hr />

<p><em>Generated by <a href="https://tinyproxy.github.io/">tinyproxy</a> version 1.10.0.</em></p>

</body>

</html>
)
/ waiting for check to complete

output of linkerd check -o short

Output is verbose as the short check hangs

➜  ~ linkerd check --verbose
Linkerd core checks
===================

kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version

linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ control plane pods are ready
√ cluster networks can be verified
√ cluster networks contains all node podCIDRs

linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ proxy-init container runs as root user if docker container runtime is used
DEBU[0004] Skipping check: cni plugin ConfigMap exists. Reason: skipping check because CNI is not enabled
DEBU[0004] Skipping check: cni plugin ClusterRole exists. Reason: skipping check because CNI is not enabled
DEBU[0004] Skipping check: cni plugin ClusterRoleBinding exists. Reason: skipping check because CNI is not enabled
DEBU[0004] Skipping check: cni plugin ServiceAccount exists. Reason: skipping check because CNI is not enabled
DEBU[0004] Skipping check: cni plugin DaemonSet exists. Reason: skipping check because CNI is not enabled
DEBU[0004] Skipping check: cni plugin pod is running on all nodes. Reason: skipping check because CNI is not enabled

linkerd-identity
----------------
√ certificate config is valid
√ trust anchors are using supported crypto algorithm
√ trust anchors are within their validity period
√ trust anchors are valid for at least 60 days
√ issuer cert is using supported crypto algorithm
√ issuer cert is within its validity period
√ issuer cert is valid for at least 60 days
√ issuer cert is issued by the trust anchor

linkerd-webhooks-and-apisvc-tls
-------------------------------
√ proxy-injector webhook has valid cert
√ proxy-injector cert is valid for at least 60 days
√ sp-validator webhook has valid cert
√ sp-validator cert is valid for at least 60 days
√ policy-validator webhook has valid cert
√ policy-validator cert is valid for at least 60 days

linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date

control-plane-version
---------------------
√ can retrieve the control plane version
√ control plane is up-to-date
√ control plane and cli versions match

linkerd-control-plane-proxy
---------------------------
√ control plane proxies are healthy
√ control plane proxies are up-to-date
√ control plane proxies and cli versions match
DEBU[0007] Skipping check: pod injection disabled on kube-system. Reason: not run for non HA installs
DEBU[0007] Skipping check: multiple replicas of control plane pods. Reason: not run for non HA installs

Linkerd extensions checks
=========================

linkerd-multicluster
--------------------
√ Link CRD exists
√ multicluster extension proxies are healthy
√ multicluster extension proxies are up-to-date
√ multicluster extension proxies and cli versions match

\ Running viz extension check ^C

Environment

Cluster environment: Private GKE Cluster running Dataplane V2

➜  ~ kubectl version --short
Client Version: v1.22.10
Server Version: v1.21.11-gke.1100
➜  ~ linkerd version
Client version: stable-2.11.2
Server version: stable-2.11.2

My local OS is MacOS Apple M1

Possible solution

I think this would be solved by the suggestion here: https://github.com/linkerd/linkerd2/issues/1696

Additional context

No response

Would you like to work on fixing this bug?

No response

shadiramadan avatar Jun 02 '22 13:06 shadiramadan

Hi @shadiramadan! We do, in fact, use a port-forward to expose the Linkerd dashboard. You can see the port-forward being created in the logs:

DEBU[0006] Starting port forward to https://10.8.16.2/api/v1/namespaces/linkerd-viz/pods/metrics-api-569cd6f764-rwphk/portforward?timeout=30s 50927:8085

However, when the Linkerd CLI attempts to send a request to this port-forward, (http://localhost:50927/api/v1/SelfCheck) it seems like that request is getting intercepted by Tinyproxy. However you have Tinyproxy set up, it's interfering with Linkerd's health checks.

adleong avatar Jun 13 '22 18:06 adleong

Ohhhhhh I don't think tinyproxy is setup to work with gRPC. When I'm just port forwarding the dashboard directly it seems to work fine so maybe that flow doesn't rely on gRPC at all. I'll dig in more and report back.

shadiramadan avatar Jun 13 '22 18:06 shadiramadan

I have basically the same problem. My cluster is GKE Private using a bastion host with tinyproxy. Looking at the tinyproxy logs, I see that linkerd CLI sends its localhost requests via the proxy, which is defined in ~/.kube/config as proxy-url: http://localhost:8888 I am not using HTTPS_PROXY or HTTP_PROXY as environment variables

I would expect linkerd cli, once port-forward is set, to make its localhost calls without using the kubectl proxy

linkerd viz tap deployment/web --namespace emojivoto --verbose

DEBU[0006] Starting port forward to https://10.248.39.2/api/v1/namespaces/linkerd-viz/pods/metrics-api-b878bdccd-7n4p9/portforward?timeout=30s 40797:8085 
DEBU[0006] Port forward initialised                     
DEBU[0006] Expecting API to be served over [http://localhost:40797/api/v1/] 
DEBU[0006] Making gRPC-over-HTTP call to [http://localhost:40797/api/v1/SelfCheck] [] 
DEBU[0007] Response from [http://localhost:40797/api/v1/SelfCheck] had headers: map[Connection:[close] Content-Type:[text/html] Server:[tinyproxy/1.10.0]] 
DEBU[0007] gRPC-over-HTTP call returned status [500 Unable to connect] and content length [-1] 
Cannot connect to Linkerd Viz: HTTP error, status Code [500] (unexpected API response: <?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

tiny proxy log

INFO      Aug 04 16:46:39 [46684]: Closed connection between local client (fd:7) and remote client (fd:8)
INFO      Aug 04 16:46:39 [46614]: Closed connection between local client (fd:7) and remote client (fd:8)
CONNECT   Aug 04 16:46:39 [46692]: Request (file descriptor 7): POST http://localhost:40797/api/v1/SelfCheck HTTP/1.1
INFO      Aug 04 16:46:39 [46692]: No upstream proxy for localhost

Looking around the code, I found this merge request #8625 which gave me the impression it fixes the problem.

But in my case, seems to persist. I'm not sure if it's because the server is using linkerd stable and the cli is edge for the purposes of this test.

The only thing I'm pretty sure about, is that it's not tinyproxy's fault.

❯ ~/downloads/linkerd2-cli-edge-22.7.3-linux-amd64 viz tap deployment/web --namespace emojivoto --verbose
DEBU[0005] Starting port forward to https://10.248.39.2/api/v1/namespaces/linkerd-viz/pods/metrics-api-b878bdccd-7n4p9/portforward?timeout=30s 38331:8085 
DEBU[0005] Port forward initialised                     
Cannot connect to Linkerd Viz: rpc error: code = Unavailable desc = connection closed before server preface received
Validate the install with: linkerd viz check
❯ netstat -tapn |grep 38331
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 127.0.0.1:38331         0.0.0.0:*               LISTEN      206167/linkerd2-cli 
tcp6       0      0 ::1:38331               :::*                    LISTEN      206167/linkerd2-cli                
❯ curl -v localhost:38331                                                  
*   Trying 127.0.0.1:38331...
* Connected to localhost (127.0.0.1) port 38331 (#0)
> GET / HTTP/1.1
> Host: localhost:38331
> User-Agent: curl/7.84.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Type: application/octet-stream
< Linkerd-Error: Internal Server Error
< Date: Thu, 04 Aug 2022 18:59:59 GMT
< Content-Length: 19
< 
Warning: Binary output can mess up your terminal. Use "--output -" to tell 
Warning: curl to output it to your terminal anyway, or consider "--output 
Warning: <FILE>" to save to a file.
* Failure writing output to destination
* Closing connection 0

cotocisternas avatar Aug 04 '22 16:08 cotocisternas

Hi @cotocisternas

I think the errors you're seeing are at least partially due to using mismatched versions of the Linkerd CLI and control plane. Make sure that you're using a CLI version which matches your control plane.

adleong avatar Aug 11 '22 21:08 adleong

Hi @shadiramadan

changed -enforced-host arg value from "^dashboard.example.com$" to ".*" in top deployment which is in linkerd-viz namespace and then run below command to access Linkerd dashboard.

linkerd viz dashboard --address 0.0.0.0

iamhritik avatar Oct 10 '22 18:10 iamhritik