eks-anywhere icon indicating copy to clipboard operation
eks-anywhere copied to clipboard

Long names of `Cluster` (and other) resources cause bootstrapping cluster to fail creation with generic error

Open dejarikra opened this issue 1 year ago • 2 comments

What happened: Attempting to bootstrap an EKS Anywhere cluster using a (very) long name for the Cluster (and other) resources causes the kind bootstrap cluster to fail creating.

Amongst the plethora of logs, these snippets seem to be most relevant

[apiclient] All control plane components are healthy after 22.003180 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace

...

I0115 20:11:27.767218     147 uploadconfig.go:131] [upload-config] Preserving the CRISocket information for the control-plane node
I0115 20:11:27.767232     147 patchnode.go:31] [patchnode] Uploading the CRI Socket information "unix:///run/containerd/containerd.sock" to the Node API object "diamond-foo22-prod-eksa-mgmt-cluster-eks-a-cluster-control-plane" as an annotation

...

I0115 20:13:27.771568     147 round_trippers.go:553] GET https://diamond-foo22-prod-eksa-mgmt-cluster-eks-a-cluster-control-plane:6443/api/v1/nodes/diamond-foo22-prod-eksa-mgmt-cluster-eks-a-cluster-control-plane?timeout=10s 404 Not Found in 2 milliseconds
I0115 20:13:27.775214     147 round_trippers.go:553] GET https://diamond-foo22-prod-eksa-mgmt-cluster-eks-a-cluster-control-plane:6443/api/v1/nodes/diamond-foo22-prod-eksa-mgmt-cluster-eks-a-cluster-control-plane?timeout=10s 404 Not Found in 2 milliseconds

...

nodes "diamond-foo22-prod-eksa-mgmt-cluster-eks-a-cluster-control-plane" not found
Error writing Crisocket information for the control-plane node

What you expected to happen: Bootstrap cluster creates successfully.

In the alternative, (and if my guess is correct regarding the character length of the Cluster being the root cause), then some more helpful error message regarding the length of names of (certain) Kubernetes object should be shown. In such a case, it is probably fair to expect that eksctl anywhere validates the Cluster (and other resources) name length before actually creating the bootstrap cluster.

How to reproduce it (as minimally and precisely as possible):

  1. Try to bootstrap an EKS Anywhere cluster (using almost default configuration) against vSphere Provider (might be reproducable with our providers as well, as the problem manifests itself when creating the KIND bootstrap cluster).
  2. Make sure to use a very long name for the Cluster (and similarly for the other resources, including VSphereMachineConfig, VSphereDatacenterConfig). I used a name with 36 characters.

Renaming the Cluster object down to a more sensible 20-character string (and similarly for VSphereMachineConfig, VSphereDatacenterConfig resources) fixed the problem.

Anything else we need to know?:

  • Cluster name (obfuscated, but identical length and character layout): diamond-foo22-prod-eksa-mgmt-cluster
  • (As evident in the last few lines of the logs shown above), the longe Cluster name caused eksctl anywhere to name the control plane node group diamond-foo22-prod-eksa-mgmt-cluster-eks-a-cluster-control-plane (which is 64 characters in length)

Environment:

  • EKS Anywhere Release: v0.18.4
  • EKS Distro Release: bottlerocket-v1.28.4-eks-d-1-28-12-eks-a-56-amd64
  • Operating System: Fresh Ubuntu 22.04 VM

Note: I am not entirely sure if this is the correct place to file this bug; but it seemed a good place to start. If the community feels reporting this to kind or kubeadm makes more sense, I'll be happy to do so.

dejarikra avatar Jan 15 '24 20:01 dejarikra

Hello, thanks for the report. We will try replicate internally and get back.

jiayiwang7 avatar Jan 22 '24 16:01 jiayiwang7

Thanks for reporting @dejarikra! We were able to reproduce the issue on our end as well by setting the EKS-A Cluster resource's name to exactly 36 characters long. On the EKS-A side, we add a suffix -eks-a-cluster to get the KinD cluster's name (which makes it 50 characters long) and on top of this, KinD also adds a -control-plane suffix to the cluster name to arrive at the name for the control plane node (container), which makes the control plane node name 64 characters in length.

The interesting part is Kind themselves allow a maximum cluster name length of 50. I tried creating a cluster with 37 characters in length and rightfully hit this warning, followed by a different error when creating the control-plane container which occurs because docker run is invoked with the --hostname parameter which is supplied a 65-character long hostname which doesn't conform to sethostname's 64 character restrictions.

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: sethostname: invalid argument: unknown.

Based on some reading, I reached the conclusion that a maximum of 64 characters is allowed by both Kind and sethostname so the issue may be something in kubeadm or Kubernetes itself. Will dig further and update this issue if I find something.

abhay-krishna avatar Jan 23 '24 04:01 abhay-krishna