linkerd2
linkerd2 copied to clipboard
multicluster install misses different clusterName
What is the issue?
I am linking two k8s (1.21) clusters. One local cluster name is cluster.local, the second ones is foo.cluster.local. All mTLS is set up correctly and working for this scenario.
After a linkerd multicluster install I get TLS connection errors for linkerd-gateway pod on the foo.cluster.local.
Checking the created manifest by linkerd multicluster install it states:
apiVersion: multicluster.linkerd.io/v1alpha1
kind: Link
metadata:
name: foo
namespace: linkerd-multicluster
spec:
clusterCredentialsSecret: cluster-credentials-foo
gatewayIdentity: linkerd-gateway.linkerd-multicluster.serviceaccount.identity.linkerd.cluster.local
Changing gatewayIdentity to the expected linkerd-gateway.linkerd-multicluster.serviceaccount.identity.linkerd.foo.cluster.local and reapplying it to the cluster.local and the connection is working.
I installed linkerd via helm charts (2.11.4) and specified clusterName where possible. The multicluster helm chart does not provide this field, only identityTrustDomain and it's my understanding this should be cluster.local, since this is the domain of my CA cert.
How can it be reproduced?
Set up two k8s cluster, one with clusterName as foo.cluster.local
Set up linkerd + multicluster via helm
call linkerd --context foo multicluster instt
check the manifest for Link resource.
Logs, error output, etc
2022-07-29T21:59:47+02:00 [255843.324546s] INFO ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}: linkerd_app_core::serve: Connection closed error=Unexpected TLS connection to linkerd-gateway.linkerd-multicluster.serviceaccount.identity.linkerd.cluster.local from XXX.XXX.XXX.XXX:56071 client.addr=xxx.xxx.xxx.xxx:56071
output of linkerd check -o short
Linkerd core checks
===================
linkerd-ha-checks
-----------------
‼ pod injection disabled on kube-system
kube-system namespace needs to have the label config.linkerd.io/admission-webhooks: disabled if injector webhook failure policy is Fail
see https://linkerd.io/2.11/checks/#l5d-injection-disabled for hints
Status check results are √
Environment
- k8s 1.21.9
- linkerd 2.11.4
Possible solution
No response
Additional context
No response
Would you like to work on fixing this bug?
No response
I don't think this is an issue; you should be able to fix this if you change linkerd multicluster install.
The gateway identity used in cross-cluster communication is retrieved from an annotation on the linkerd-gateway Service here. So we want to make sure that value is set correctly.
The value is set by the Helm templates during linkerd multicluster install, and the cluster domain value is templated (.Values.identityTrustDomain).
So, when installing multicluster on the foo.cluster.local cluster, you should make sure to set that as well: linkerd multicluster install --cluster-name foo-cluster --set identityTrustDomain='foo.cluster.local' ....
You can verify the correctness by looking at the annotations on the linkerd-gateway Service before linking clusters.
Turns out we already should be handling this, but it's a subtle difference between the linkerd-config ConfigMap's ClusterDomain and IdentityTrustDomain fields.
Because we are using the gateway's identity here, we want to make sure that when installing Linkerd, we specify that value as well: linkerd install --set identityTrustDomain='foo.cluster.local'. That way, linkerd multicluster install creates the linkerd-gateway Service with the right identity annotation.
So, this particularly case is going to require two configurations when installing Linkerd: the cluster domain and identity trust domain.
Moving this out of the milestone for now until we hear back more about the situation described. I answered with the details provided, but unfortunately it's not been easy to tell if this was user error.