linkerd2 icon indicating copy to clipboard operation
linkerd2 copied to clipboard

multicluster install misses different clusterName

Open huehnerhose opened this issue 3 years ago • 2 comments
trafficstars

What is the issue?

I am linking two k8s (1.21) clusters. One local cluster name is cluster.local, the second ones is foo.cluster.local. All mTLS is set up correctly and working for this scenario.

After a linkerd multicluster install I get TLS connection errors for linkerd-gateway pod on the foo.cluster.local.

Checking the created manifest by linkerd multicluster install it states:

apiVersion: multicluster.linkerd.io/v1alpha1
kind: Link
metadata:
  name: foo
  namespace: linkerd-multicluster
spec:
  clusterCredentialsSecret: cluster-credentials-foo
  gatewayIdentity: linkerd-gateway.linkerd-multicluster.serviceaccount.identity.linkerd.cluster.local

Changing gatewayIdentity to the expected linkerd-gateway.linkerd-multicluster.serviceaccount.identity.linkerd.foo.cluster.local and reapplying it to the cluster.local and the connection is working.

I installed linkerd via helm charts (2.11.4) and specified clusterName where possible. The multicluster helm chart does not provide this field, only identityTrustDomain and it's my understanding this should be cluster.local, since this is the domain of my CA cert.

How can it be reproduced?

Set up two k8s cluster, one with clusterName as foo.cluster.local

Set up linkerd + multicluster via helm

call linkerd --context foo multicluster instt

check the manifest for Link resource.

Logs, error output, etc

2022-07-29T21:59:47+02:00 [255843.324546s] INFO ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}: linkerd_app_core::serve: Connection closed error=Unexpected TLS connection to linkerd-gateway.linkerd-multicluster.serviceaccount.identity.linkerd.cluster.local from XXX.XXX.XXX.XXX:56071 client.addr=xxx.xxx.xxx.xxx:56071

output of linkerd check -o short

Linkerd core checks
===================

linkerd-ha-checks
-----------------
‼ pod injection disabled on kube-system
    kube-system namespace needs to have the label config.linkerd.io/admission-webhooks: disabled if injector webhook failure policy is Fail
    see https://linkerd.io/2.11/checks/#l5d-injection-disabled for hints

Status check results are √

Environment

  • k8s 1.21.9
  • linkerd 2.11.4

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

No response

huehnerhose avatar Jul 29 '22 20:07 huehnerhose

I don't think this is an issue; you should be able to fix this if you change linkerd multicluster install.

The gateway identity used in cross-cluster communication is retrieved from an annotation on the linkerd-gateway Service here. So we want to make sure that value is set correctly.

The value is set by the Helm templates during linkerd multicluster install, and the cluster domain value is templated (.Values.identityTrustDomain).

So, when installing multicluster on the foo.cluster.local cluster, you should make sure to set that as well: linkerd multicluster install --cluster-name foo-cluster --set identityTrustDomain='foo.cluster.local' ....

You can verify the correctness by looking at the annotations on the linkerd-gateway Service before linking clusters.

kleimkuhler avatar Aug 08 '22 20:08 kleimkuhler

Turns out we already should be handling this, but it's a subtle difference between the linkerd-config ConfigMap's ClusterDomain and IdentityTrustDomain fields.

Because we are using the gateway's identity here, we want to make sure that when installing Linkerd, we specify that value as well: linkerd install --set identityTrustDomain='foo.cluster.local'. That way, linkerd multicluster install creates the linkerd-gateway Service with the right identity annotation.

So, this particularly case is going to require two configurations when installing Linkerd: the cluster domain and identity trust domain.

kleimkuhler avatar Aug 10 '22 20:08 kleimkuhler

Moving this out of the milestone for now until we hear back more about the situation described. I answered with the details provided, but unfortunately it's not been easy to tell if this was user error.

kleimkuhler avatar Sep 20 '22 15:09 kleimkuhler