cluster-api
cluster-api copied to clipboard
cluster with same name under different namespace is provisioned but no infra created
What steps did you take and what happened: Execute below commands:
- kind create cluster --name=test-mc
- export KUBECONFIG="$(kind get kubeconfig-path --name="clusterapi1")"
- kubectl create -f https://github.com/kubernetes-sigs/cluster-api/releases/download/v0.2.4/cluster-api-components.yaml
- kubectl create -f https://github.com/kubernetes-sigs/cluster-api-bootstrap-provider-kubeadm/releases/download/v0.1.0/bootstrap-components.yaml
- clusterawsadm alpha bootstrap create-stack
- aws ssm put-parameter --name "/sigs.k8s.io/cluster-api-provider-aws/ssh-key" --type SecureString --value "$(aws ec2 create-key-pair --key-name default | jq .KeyMaterial -r)"
- export AWS_CREDENTIALS=$(aws iam create-access-key --user-name bootstrapper.cluster-api-provider-aws.sigs.k8s.io)
- export AWS_ACCESS_KEY_ID=$(echo $AWS_CREDENTIALS | jq .AccessKey.AccessKeyId -r)
- export AWS_SECRET_ACCESS_KEY=$(echo $AWS_CREDENTIALS | jq .AccessKey.SecretAccessKey -r)
- export AWS_B64ENCODED_CREDENTIALS=$(clusterawsadm alpha bootstrap encode-aws-credentials)
- curl -L https://github.com/kubernetes-sigs/cluster-api-provider-aws/releases/download/v0.4.2/infrastructure-components.yaml | envsubst | kubectl create -f -
- kubectl apply -f cluster.yaml where cluster.yaml contents are below
apiVersion: cluster.x-k8s.io/v1alpha2
kind: Cluster
metadata:
name: capi-quickstart
spec:
clusterNetwork:
pods:
cidrBlocks: ["192.168.0.0/16"]
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
kind: AWSCluster
name: capi-quickstart
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
kind: AWSCluster
metadata:
name: capi-quickstart
spec:
# Change this value to the region you want to deploy the cluster in.
region: us-east-2
# Change this value to a valid SSH Key Pair present in your AWS Account.
sshKeyName: default
- wait till the cluster PHASE is provisioned and verify basic infra created in AWS cluster
- kubectl create namespace duplicate-cluster
- kubectl apply -f dup-cluster.yaml where the yaml contents are below
apiVersion: cluster.x-k8s.io/v1alpha2
kind: Cluster
metadata:
name: capi-quickstart
namespace: duplicate-cluster
spec:
clusterNetwork:
pods:
cidrBlocks: ["192.168.0.0/16"]
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
kind: AWSCluster
name: capi-quickstart
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
kind: AWSCluster
metadata:
name: capi-quickstart
namespace: duplicate-cluster
spec:
# Change this value to the region you want to deploy the cluster in.
region: us-east-2
# Change this value to a valid SSH Key Pair present in your AWS Account.
sshKeyName: default
after come times cluster capi-quickstart is in PROVISIONED phase but no infra created. And deletion of the cluster also fails. What did you expect to happen: cluster with same name shouldn't be allowed or security groups naming convention should be changed so that for every cluster those get created
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.] CAPA-log:
I1016 05:21:14.332092 1 awscluster_controller.go:69] controllers/AWSCluster "msg"="Cluster Controller has not yet set OwnerRef" "awsCluster"="capi-quickstart" "namespace"="duplicate-cluster"
I1016 05:21:14.346281 1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-quickstart" "cluster"="capi-quickstart" "namespace"="duplicate-cluster"
I1016 05:21:19.918843 1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-quickstart" "cluster"="capi-quickstart" "namespace"="duplicate-cluster"
I1016 05:26:00.620580 1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-quickstart" "cluster"="capi-quickstart" "namespace"="default"
I1016 05:26:00.620881 1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-qs1" "cluster"="capi-qs1" "namespace"="test-kr"
I1016 05:26:06.073879 1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-quickstart" "cluster"="capi-quickstart" "namespace"="duplicate-cluster"
I1016 05:35:59.931933 1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-qs1" "cluster"="capi-qs1" "namespace"="test-kr"
I1016 05:35:59.931933 1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-quickstart" "cluster"="capi-quickstart" "namespace"="default"
I1016 05:36:05.393440 1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-quickstart" "cluster"="capi-quickstart" "namespace"="duplicate-cluster"```
**Environment:**
- Cluster-api version: v0.2.4
- Minikube/KIND version: v0.5.1
- Kubernetes version: (use `kubectl version`):
- OS (e.g. from `/etc/os-release`): darwin x86_64
/kind bug
There is a similar issue tracked against CAPA here: https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/969
I do think this is something that we should probably validate and try to guard against in CAPI rather than CAPA, though.
@liztio we should probably ensure that we are validating this as part of the validating webhooks work you are doing.
@detiber shall we provide a concept table to show what is CAPI, CAPA? I assume CAPI is cluster api provider interface and CAPA is cluster-api-provider-aws?
Ah, I saw the glossary at https://cluster-api.sigs.k8s.io/reference/glossary.html#c , thanks @detiber
I tried to reproduce this issue but ended up in a really weird state. I created two clusters but messed up the SSHKeyName so I had to delete them both and try again. But I ended up not being able to delete the duplicate cluster. You can see the state of my system here:
https://cloud.tilt.dev/snapshot/AfTE99wLlyPeKZAVyl8=
Scroll up just a bit to see some highlighted lines.
@chuckha were you ever able to make any more progress on this?
As I mentioned in the PR linked above, it is not possible in AWS to create two clusters in different namespaces with the same name. I have not attempted to fix the problem beyond the PoC linked above. I suspect there may be other components that do not respect name/namespace as primary key and only look at the name, but that should be easy enough to figure out with some dedicated testing and poking.
I'd suggest, for anyone looking to get involved here, to create a cluster, take inventory of all items that exist, then make another cluster with the same name in a different namespace and make sure all the components expected to exist, exist. Then make sure the cluster actually came up.
Going through open unassigned issues in the v0.3.0 milestone. We have a decent amount of work left to do on features (control plane, clusterctl, etc). While this is an unfortunately ugly bug, I think we need to defer it to v0.4.
/milestone Next /remove-priority important-soon /priority important-longterm
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/lifecycle frozen
This issue seems mostly a documentation / infrastructure limitation, @detiber thoughts?
/help
@vincepri: This request has been marked as needing help from a contributor.
Please ensure the request meets the requirements listed here.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.
In response to this:
This issue seems mostly a documentation / infrastructure limitation, @detiber thoughts?
/help
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
This issue seems mostly a documentation / infrastructure limitation, @detiber thoughts?
It depends? The issue is complicated quite a bit by the kubernetes cloud provider integration, which requires a unique cluster name as well.
I think we have a couple of paths we can take here:
- Fully externalize the issue and document that it shouldn't be done
- This is what we've done to date
- Poor UX, could end up with competing reconciliations that could end with running workloads being affected when trying to spin up a new cluster with the same name/account/region.
- Enforcing uniqueness at the management cluster level (even across namespaces) still doesn't solve the full issue, since different management clusters could be using the same infrastructure accounts/regions.
- Introduce some type of uniqueness that we inject into the bootstrapping config (and also consume by infrastructure providers) that tries to work around the cloud provider integration issue.
- Would probably provide the best UX
- However it would also introduce backwards compatibility challenges for existing workload clusters
I believe the second path is probably the right one to take longer term, however it's also a far from trivial challenge to implement across the various providers in a backwards compatible way.
~Also if we create a cluster with the same name in two different namespaces. The kubeconfig and other secrets in the default namespace get overwritten. This way you lose all certs for the first cluster.~
~One solution would be to keep all secrets scoped to the specific cluster namespace. Ideally each cluster should have its own namespace.~
UPDATE: https://github.com/kubernetes-sigs/cluster-api/issues/1554#issuecomment-634123240
@lomkju the secrets managed by Cluster API are scoped to the cluster's namespace. Are you seeing some different behavior?
@ncdc After testing this again, I found that the secrets are indeed created in the respective namespace. I was wrong in the above comment. But actually the problem is that if we create clusters with the same name in different namespaces the same ELB is used for both masters. That's why I'm getting the below error sometimes because requests are being sent to the other master which is using a different CA.
ClusterAPI is like trying to use the same AWS resources for both clusters. (VPC, ELB, IAM ...)
➜ k describe node ip-10-0-0-161.ap-south-1.compute.internal
error: You must be logged in to the server (Unauthorized)
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/lifecycle frozen
During backlog grooming, @detiber proposed to introduce a contract for our infrastructure providers to at least use the namespaced name and document these limitations.
/cc @randomvariable @CecileRobertMichon
FYI, in CAPD we already faced problems due to the length of the machine names (see https://github.com/kubernetes-sigs/cluster-api/issues/3599), so the idea of concatenating
Finding a good trade-off between shortness, uniqueness, and meaningfulness of names is the first challenge here. The second one is to ensure a viable upgrade path for existing infrastructure if the naming scheme changes for infrastructure components changes.
/kind documentation /assign @randomvariable to document the contract guidelines for providers
/milestone v1.1
/assign @yastij to reassess
/triage accepted This is up to providers, if we want this to happen the topic should be raised in the office hours and an agreement between provider implementer should be reached
/help
Is this issue specific to capa? Can capz or capv create clusters of the same name in different namespaces?
CAPA was the only ask for me to test. Haven't verified on capz or capv.
This issue has not been updated in over 1 year, and should be re-triaged.
You can:
- Confirm that this issue is still relevant with
/triage accepted(org members only) - Close this issue with
/close
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/
/remove-triage accepted
/close
This issue has not been updated in over 1 year and to make this happen consistently across all the providers it is required a wide consensus + a strong push/someone investing time in this effort.
We can re-open (or re-create) whenever there are the conditions and the required community consensus to work on it
@fabriziopandini: Closing this issue.
In response to this:
/close
This issue has not been updated in over 1 year and to make this happen consistently across all the providers it is required a wide consensus + a strong push/someone investing time in this effort.
We can re-open (or re-create) whenever there are the conditions and the required community consensus to work on it
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.