linkerd2
linkerd2 copied to clipboard
multicluster connectivity issue
What is the issue?
Here is the setup details for the 2 clusters - Master and Agent
- Both clusters are on the default k3s setup i.e, it comes with default Flannel, Traefik, etc
- Tried in both 2 RPI in 1 setup and another on 2 VMs in Google Cloud - both report exactly same issue
- MariaDB database as Statefulset on the Agent-Cluster
- Adminer UI on the Agent-Cluster
- Linkerd with multicluster extension has been installed in both the clusters. The trust anchor is setup correctly as well
- Mulicluster link with cluster name "agent" is created from Agent cluster and applied to Master. All the linkerd checks are passed with
linkerd mc check
correctly showing the - MariaDB database (on Agent cluster) has been annotated with the linkerd inject. Also a label for the mirror is added
- MariaDB service is correctly started and mariadb-svc-agent is visible in Master cluster
- The Adminer UI does not correct mariadb-svc-agent service. It reports
unauthorized connection on server/linkerd-gateway
. There should not be any unauthorized connected reported since I can see that both both apps Maraidb and adminer are meshed (in viz extension) - Alternatively, if I install the adminer in the agent cluster (the same where the mariadb is installed), the connection to direct Mariadb service
mariadb-svc
goes through fine. This is to prove that the connectivity between MariaDB and adminer works fine.
How can it be reproduced?
Please check the setup details in the "issue section" Images used:
- mariadb
- adminer
Logs, error output, etc
[ 7192.934712s] INFO ThreadId(01) inbound:server{port=4143}:gateway{dst=mariadb-svc.default.svc.cluster.local:3306}: linkerd_app_inbound::policy::tcp: Connection denied server.group=policy.linkerd.io server.kind=server server.name=linkerd-gateway tls=Some(Established { client_id: Some(ClientId(Name("default.default.serviceaccount.identity.linkerd.cluster.local"))), negotiated_protocol: Some("transport.l5d.io/v1") }) client=10.42.0.1:14266 [ 7192.935000s] INFO ThreadId(01) inbound: linkerd_app_core::serve: Connection closed error=unauthorized connection on server/linkerd-gateway client.addr=10.42.0.1:14266 Logs fro
output of linkerd check -o short
Both for l --context=master -o short
& l --context=agent -o short
Status check results are √
Also the agent connectivity is fine. Output of l --context=master mc gateways
CLUSTER ALIVE NUM_SVC LATENCY
agent True 2 3ms
Environment
Kubernetes Client Version: v1.24.4+k3s1 Kustomize Version: v4.5.4 Server Version: v1.24.4+k3s1 Cluster Environment K3s running on 2 RPIs each with 1 node cluster. Also tested on K3s running on 2 VMs on Google Cloud as 1 node cluster Host OS: RPI - bullseye; Google Cloud: Ubuntu
Linkerd Version: 3.12
Possible solution
No response
Additional context
No response
Would you like to work on fixing this bug?
No response
Hi @manju-rn! It looks like your linkerd-gateway
is rejecting connections from the remote cluster as unauthorized. To debug this, the linkerd authz
tool is useful:
> linkerd authz -n linkerd-multicluster deploy/linkerd-gateway
ROUTE SERVER AUTHORIZATION_POLICY SERVER_AUTHORIZATION
* linkerd-gateway linkerd-gateway
* gateway-proxy-admin linkerd-gateway-probe
* gateway-proxy-admin proxy-admin
This shows that the linkerd-gateway
Server has a ServerAuthorization called linkerd-gateway
. You should ensure that this resource exists and contains the right information:
> kubectl -n linkerd-multicluster get serverauthorization/linkerd-gateway -o yaml
[...]
spec:
client:
meshTLS:
identities:
- '*'
networks:
- cidr: 0.0.0.0/0
- cidr: ::/0
server:
name: linkerd-gateway
This says that all meshed traffic from any source should be authorized.
Thanks for the details. I shall check out the details later today. However, as a default install, i would have expected this to work out of box. Are there some changes to be made to Server Authorization objects during / after installation that i might have missed?
As i did multiple installation and confirmed that linkerd checks were successful, i was under the impression that all the crds and core resources would have been correct
Yes, this should work out of the box and does in our testing. If you can provide a specific set of commands to reproduce the failure, we can investigate.
okay. However, the out of box feature does not seem to be working in my setup. Here is the outcome of the server auth resources
manju@rpi400:~/mariadb/k3s $ k -n linkerd-multicluster get serverauthorization/linkerd-gateway -o yaml
apiVersion: policy.linkerd.io/v1beta1
kind: ServerAuthorization
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"policy.linkerd.io/v1beta1","kind":"ServerAuthorization","metadata":{"annotations":{"linkerd.io/created-by":"linkerd/cli stable-2.12.0"},"labels":{"app":"linkerd-gateway","linkerd.io/extension":"multicluster"},"name":"linkerd-gateway","namespace":"linkerd-multicluster"},"spec":{"client":{"meshTLS":{"identities":["*"]},"networks":[{"cidr":"0.0.0.0/0"},{"cidr":"::/0"}]},"server":{"name":"linkerd-gateway"}}}
linkerd.io/created-by: linkerd/cli stable-2.12.0
creationTimestamp: "2022-09-16T01:44:37Z"
generation: 1
labels:
app: linkerd-gateway
linkerd.io/extension: multicluster
name: linkerd-gateway
namespace: linkerd-multicluster
resourceVersion: "27024"
uid: 48e2cd1f-8896-4407-a60f-ba97356f5123
spec:
client:
meshTLS:
identities:
- '*'
networks:
- cidr: 0.0.0.0/0
- cidr: ::/0
server:
name: linkerd-gateway
I still the get same error
[ 2490.197192s] INFO ThreadId(01) inbound:server{port=4143}:gateway{dst=mariadb-svc.default.svc.cluster.local:3306}: linkerd_app_inbound::policy::tcp: Connection denied server.group=policy.linkerd.io server.kind=server server.name=linkerd-gateway tls=Some(Established { client_id: Some(ClientId(Name("default.default.serviceaccount.identity.linkerd.cluster.local"))), negotiated_protocol: Some("transport.l5d.io/v1") }) client=10.42.0.1:14516
[ 2490.197465s] INFO ThreadId(01) inbound: linkerd_app_core::serve: Connection closed error=unauthorized connection on server/linkerd-gateway client.addr=10.42.0.1:14516
Yes, this should work out of the box and does in our testing. If you can provide a specific set of commands to reproduce the failure, we can investigate.
I have attached the manifests file (attached as a log extension, please change it to yaml) for the mariadb and adminer (UI for DB). Here are the steps. The certificates for the rootCA and the intermediate CA are generated via openssl and the exact same are used while installing linkered in both the clusters
linkerd --context=master install --crds | kubectl --context=master apply -f -
linkerd --context=master install --identity-trust-anchors-file manjuca.crt --identity-issuer-certificate-file manjuissuer.crt --identity-issuer-key-file manjuissuer.key | kubectl --context=master apply -f -
linkerd --context=agent install --crds | kubectl --context=master apply -f -
# Use the common trust anchor certificates
linkerd --context=agent install --identity-trust-anchors-file manjuca.crt --identity-issuer-certificate-file manjuissuer.crt --identity-issuer-key-file manjuissuer.key | kubectl --context=agent apply -f -
linkerd --context=master mc install | kubectl --context=master apply -f -
linkerd --context=agent mc install | kubectl --context=agent apply -f -
linkerd --context=agent mc link --cluster-name agent | kubectl --context=master apply -f -
# Deployed the mariadb in agent
# Deployed the adminer in master
# mariadb-svc-agent was properly created in master
# For Adminer UI - goto http://<ip:add>:9090
# provide the mariadb-svc-agent.default.svc.cluster.local in adminer with testadmin/testadmin as credentials
# Check logs of linkered-gateway pod (in agent)
Just wondering if it is related to CNI implemenation. I will remove the defualt flannel of K3s and try with Calico
@adleong Any results from your findings on the setup i have?
I tried setting up calico but looks like i have trouble starting the Loadbalancer and hence umable to test the linkered multicluster setup yet
Thanks for the detailed reproduction instructions. We haven't had a chance to look into this yet.
Thanks. Let me know any more details are required. I have also reproduce the same error with calico as the CNI on k3s clusters. This is just to remove suspicion if CNI (as default flannel in k3s) may be at fault
So finally found the problem and fixed it. However, this is still a bug since I think the the default behaviour of Server
is not honored
The problem is that the Server
component has a default value of proxyProtocol
set as HTTP/1
. Hence, it was not allowing the MQTT traffic. So a change to proxyProtocol
to unknown
solved the issue for now. I will be finding out what other options work for MQTT to narrow it down
> k --context agent -n linkerd-multicluster get server/linkerd-gateway -o yaml
apiVersion: policy.linkerd.io/v1beta1
kind: Server
[...]
spec:
podSelector:
matchLabels:
app: linkerd-gateway
port: linkerd-proxy
proxyProtocol: HTTP/1
However, as per documentation, snapshot below (from https://linkerd.io/2.12/reference/authorization-policy/), the proxyProtocol
should have been defaulted to unknown
. @adleong Please check and confirm whether this is the case. Also, is there a way to set this up during the mc link
creation?
Hi @manju-rn!
https://github.com/linkerd/linkerd2/pull/9575 has been merged. However, as we discussed, it's unlikely to be related to your issue. But it sounds like you're not experiencing the problem anymore? Is there any action to take here or should we close this issue?
As of linkerd versiin 3.12, the issue was there. As i explained in earlier posts, the server
component does not take the default value of Unknown
for proxyProtocol
as should happen as per docs. It was taking HTTP/1. It was resolved by me by manually editing the manifest. So if your saying that this default behaviour was corrected in new version, then yes, i will test it out.