website
website copied to clipboard
GKE private cluster blocks tap ports by default
Bug Report
What is the issue?
Starting with a fresh cluster on GKE today with private nodes and installing linkerd via CLI installs fine but tap is blocked by the GCP firewall.
How can it be reproduced?
- Create GKE cluster with private nodes
- Install linkerd via CLI script
- Attempt to run tap or use it via dashboard:
$ linkerd tap deployment/tap --namespace linkerd
Error: HTTP error, status Code [503] (unexpected API response: service unavailable
)
Usage:
linkerd tap [flags] (RESOURCE)
...
data:image/s3,"s3://crabby-images/c98be/c98be70f8d7221f5fa2d09f126ecb6c49704fc75" alt="Screenshot 2019-08-23 at 20 01 38"
Logs, error output, etc
(If the output is long, please create a gist and paste the link here.)
linkerd check
output
kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API
kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version
linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ control plane PodSecurityPolicies exist
linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ control plane replica sets are ready
√ no unschedulable pods
√ controller pod is running
√ can initialize the client
√ can query the control plane API
linkerd-api
-----------
√ control plane pods are ready
√ control plane self-check
√ [kubernetes] control plane can talk to Kubernetes
√ [prometheus] control plane can talk to Prometheus
√ no invalid service profiles
linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date
control-plane-version
---------------------
√ control plane is up-to-date
√ control plane and cli versions match
Status check results are √
Environment
- Kubernetes Version: v1.13.7-gke.19
- Cluster Environment: GKE
- Host OS: COS
- Linkerd version: v2.5.0
Possible solution
Opening apiserver port in GCP firewall for the network.
Additional context
I spoke to @grampelberg earlier today for help debugging this and he figured it out. I was able to open port 8089/TCP which matched apiserver
on the tap pod.
Looks like this is unique to GKE and private clusters and this a documentation need for GKE rather than a bug. The steps here with the adjusted port for tap solves this issue for the dashboard. CLI gets past the failure but I haven't adjusted for RBAC to go further there.
This section solved the issue for me without creating another firewall rule for tap
edited: I actually still have to create a GCP firewall rule for the tap port 8089
in order to make it work