cluster-api cluster with same name under different namespace is provisioned but no infra created

What steps did you take and what happened: Execute below commands:

kind create cluster --name=test-mc
export KUBECONFIG="$(kind get kubeconfig-path --name="clusterapi1")"
kubectl create -f https://github.com/kubernetes-sigs/cluster-api/releases/download/v0.2.4/cluster-api-components.yaml
kubectl create -f https://github.com/kubernetes-sigs/cluster-api-bootstrap-provider-kubeadm/releases/download/v0.1.0/bootstrap-components.yaml
clusterawsadm alpha bootstrap create-stack
aws ssm put-parameter --name "/sigs.k8s.io/cluster-api-provider-aws/ssh-key" --type SecureString --value "$(aws ec2 create-key-pair --key-name default | jq .KeyMaterial -r)"
export AWS_CREDENTIALS=$(aws iam create-access-key --user-name bootstrapper.cluster-api-provider-aws.sigs.k8s.io)
export AWS_ACCESS_KEY_ID=$(echo $AWS_CREDENTIALS | jq .AccessKey.AccessKeyId -r)
export AWS_SECRET_ACCESS_KEY=$(echo $AWS_CREDENTIALS | jq .AccessKey.SecretAccessKey -r)
export AWS_B64ENCODED_CREDENTIALS=$(clusterawsadm alpha bootstrap encode-aws-credentials)
curl -L https://github.com/kubernetes-sigs/cluster-api-provider-aws/releases/download/v0.4.2/infrastructure-components.yaml | envsubst | kubectl create -f -
kubectl apply -f cluster.yaml where cluster.yaml contents are below

apiVersion: cluster.x-k8s.io/v1alpha2
kind: Cluster
metadata:
  name: capi-quickstart
spec:
  clusterNetwork:
    pods:
      cidrBlocks: ["192.168.0.0/16"]
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
    kind: AWSCluster
    name: capi-quickstart
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
kind: AWSCluster
metadata:
  name: capi-quickstart
spec:
  # Change this value to the region you want to deploy the cluster in.
  region: us-east-2
  # Change this value to a valid SSH Key Pair present in your AWS Account.
  sshKeyName: default

wait till the cluster PHASE is provisioned and verify basic infra created in AWS cluster
kubectl create namespace duplicate-cluster
kubectl apply -f dup-cluster.yaml where the yaml contents are below

apiVersion: cluster.x-k8s.io/v1alpha2
kind: Cluster
metadata:
  name: capi-quickstart
  namespace: duplicate-cluster
spec:
  clusterNetwork:
    pods:
      cidrBlocks: ["192.168.0.0/16"]
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
    kind: AWSCluster
    name: capi-quickstart
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
kind: AWSCluster
metadata:
  name: capi-quickstart
  namespace: duplicate-cluster
spec:
  # Change this value to the region you want to deploy the cluster in.
  region: us-east-2
  # Change this value to a valid SSH Key Pair present in your AWS Account.
  sshKeyName: default

after come times cluster capi-quickstart is in PROVISIONED phase but no infra created. And deletion of the cluster also fails. What did you expect to happen: cluster with same name shouldn't be allowed or security groups naming convention should be changed so that for every cluster those get created

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.] CAPA-log:

I1016 05:21:14.332092       1 awscluster_controller.go:69] controllers/AWSCluster "msg"="Cluster Controller has not yet set OwnerRef" "awsCluster"="capi-quickstart" "namespace"="duplicate-cluster" 
I1016 05:21:14.346281       1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-quickstart" "cluster"="capi-quickstart" "namespace"="duplicate-cluster" 
I1016 05:21:19.918843       1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-quickstart" "cluster"="capi-quickstart" "namespace"="duplicate-cluster" 
I1016 05:26:00.620580       1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-quickstart" "cluster"="capi-quickstart" "namespace"="default" 
I1016 05:26:00.620881       1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-qs1" "cluster"="capi-qs1" "namespace"="test-kr" 
I1016 05:26:06.073879       1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-quickstart" "cluster"="capi-quickstart" "namespace"="duplicate-cluster" 
I1016 05:35:59.931933       1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-qs1" "cluster"="capi-qs1" "namespace"="test-kr" 
I1016 05:35:59.931933       1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-quickstart" "cluster"="capi-quickstart" "namespace"="default" 
I1016 05:36:05.393440       1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-quickstart" "cluster"="capi-quickstart" "namespace"="duplicate-cluster"```

**Environment:**

- Cluster-api version: v0.2.4
- Minikube/KIND version: v0.5.1
- Kubernetes version: (use `kubectl version`):
- OS (e.g. from `/etc/os-release`): darwin x86_64

/kind bug

Oct 16 '19 06:10 sirao

There is a similar issue tracked against CAPA here: https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/969

I do think this is something that we should probably validate and try to guard against in CAPI rather than CAPA, though.

Oct 16 '19 13:10 detiber

@liztio we should probably ensure that we are validating this as part of the validating webhooks work you are doing.

Oct 16 '19 15:10 detiber

@detiber shall we provide a concept table to show what is CAPI, CAPA? I assume CAPI is cluster api provider interface and CAPA is cluster-api-provider-aws?

Oct 17 '19 02:10 gyliu513

Ah, I saw the glossary at https://cluster-api.sigs.k8s.io/reference/glossary.html#c , thanks @detiber

Oct 17 '19 05:10 gyliu513

I tried to reproduce this issue but ended up in a really weird state. I created two clusters but messed up the SSHKeyName so I had to delete them both and try again. But I ended up not being able to delete the duplicate cluster. You can see the state of my system here:

https://cloud.tilt.dev/snapshot/AfTE99wLlyPeKZAVyl8=

Scroll up just a bit to see some highlighted lines.

Nov 15 '19 18:11 chuckha

@chuckha were you ever able to make any more progress on this?

Dec 20 '19 18:12 ncdc

As I mentioned in the PR linked above, it is not possible in AWS to create two clusters in different namespaces with the same name. I have not attempted to fix the problem beyond the PoC linked above. I suspect there may be other components that do not respect name/namespace as primary key and only look at the name, but that should be easy enough to figure out with some dedicated testing and poking.

I'd suggest, for anyone looking to get involved here, to create a cluster, take inventory of all items that exist, then make another cluster with the same name in a different namespace and make sure all the components expected to exist, exist. Then make sure the cluster actually came up.

Dec 27 '19 14:12 chuckha

Going through open unassigned issues in the v0.3.0 milestone. We have a decent amount of work left to do on features (control plane, clusterctl, etc). While this is an unfortunately ugly bug, I think we need to defer it to v0.4.

/milestone Next /remove-priority important-soon /priority important-longterm

Jan 10 '20 18:01 ncdc

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

Apr 09 '20 18:04 fejta-bot

/lifecycle frozen

Apr 09 '20 18:04 vincepri

This issue seems mostly a documentation / infrastructure limitation, @detiber thoughts?

/help

Apr 20 '20 18:04 vincepri

@vincepri: This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

This issue seems mostly a documentation / infrastructure limitation, @detiber thoughts?

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Apr 20 '20 18:04 k8s-ci-robot

This issue seems mostly a documentation / infrastructure limitation, @detiber thoughts?

It depends? The issue is complicated quite a bit by the kubernetes cloud provider integration, which requires a unique cluster name as well.

I think we have a couple of paths we can take here:

Fully externalize the issue and document that it shouldn't be done
- This is what we've done to date
- Poor UX, could end up with competing reconciliations that could end with running workloads being affected when trying to spin up a new cluster with the same name/account/region.
- Enforcing uniqueness at the management cluster level (even across namespaces) still doesn't solve the full issue, since different management clusters could be using the same infrastructure accounts/regions.
Introduce some type of uniqueness that we inject into the bootstrapping config (and also consume by infrastructure providers) that tries to work around the cloud provider integration issue.
- Would probably provide the best UX
- However it would also introduce backwards compatibility challenges for existing workload clusters

I believe the second path is probably the right one to take longer term, however it's also a far from trivial challenge to implement across the various providers in a backwards compatible way.

Apr 20 '20 20:04 detiber

~Also if we create a cluster with the same name in two different namespaces. The kubeconfig and other secrets in the default namespace get overwritten. This way you lose all certs for the first cluster.~

~One solution would be to keep all secrets scoped to the specific cluster namespace. Ideally each cluster should have its own namespace.~

UPDATE: https://github.com/kubernetes-sigs/cluster-api/issues/1554#issuecomment-634123240

May 25 '20 15:05 lomkju

@lomkju the secrets managed by Cluster API are scoped to the cluster's namespace. Are you seeing some different behavior?

May 26 '20 14:05 ncdc

@ncdc After testing this again, I found that the secrets are indeed created in the respective namespace. I was wrong in the above comment. But actually the problem is that if we create clusters with the same name in different namespaces the same ELB is used for both masters. That's why I'm getting the below error sometimes because requests are being sent to the other master which is using a different CA.

ClusterAPI is like trying to use the same AWS resources for both clusters. (VPC, ELB, IAM ...)

➜ k describe node ip-10-0-0-161.ap-south-1.compute.internal
error: You must be logged in to the server (Unauthorized)

May 26 '20 16:05 lomkju

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

Aug 24 '20 16:08 fejta-bot

/lifecycle frozen

Aug 26 '20 18:08 detiber

During backlog grooming, @detiber proposed to introduce a contract for our infrastructure providers to at least use the namespaced name and document these limitations.

/cc @randomvariable @CecileRobertMichon

Oct 16 '20 18:10 vincepri

FYI, in CAPD we already faced problems due to the length of the machine names (see https://github.com/kubernetes-sigs/cluster-api/issues/3599), so the idea of concatenating -- could lead to problems.

Finding a good trade-off between shortness, uniqueness, and meaningfulness of names is the first challenge here. The second one is to ensure a viable upgrade path for existing infrastructure if the naming scheme changes for infrastructure components changes.

Oct 19 '20 08:10 fabriziopandini

/kind documentation /assign @randomvariable to document the contract guidelines for providers

Oct 22 '21 18:10 vincepri

/milestone v1.1

Oct 22 '21 18:10 vincepri

/assign @yastij to reassess

Feb 18 '22 18:02 sbueringer

/triage accepted This is up to providers, if we want this to happen the topic should be raised in the office hours and an agreement between provider implementer should be reached

Sep 30 '22 19:09 fabriziopandini

/help

Sep 30 '22 19:09 fabriziopandini

Is this issue specific to capa? Can capz or capv create clusters of the same name in different namespaces?

Feb 17 '23 08:02 YanzhaoLi

CAPA was the only ask for me to test. Haven't verified on capz or capv.

Feb 17 '23 09:02 sirao

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

Feb 17 '24 09:02 k8s-triage-robot

/close

This issue has not been updated in over 1 year and to make this happen consistently across all the providers it is required a wide consensus + a strong push/someone investing time in this effort.

We can re-open (or re-create) whenever there are the conditions and the required community consensus to work on it

Mar 29 '24 13:03 fabriziopandini

@fabriziopandini: Closing this issue.

In response to this:

/close

This issue has not been updated in over 1 year and to make this happen consistently across all the providers it is required a wide consensus + a strong push/someone investing time in this effort.

We can re-open (or re-create) whenever there are the conditions and the required community consensus to work on it

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Mar 29 '24 13:03 k8s-ci-robot

cluster-api cluster-api copied to clipboard

cluster with same name under different namespace is provisioned but no infra created

cluster-api
cluster-api copied to clipboard