eks-anywhere
eks-anywhere copied to clipboard
EKS v0.19.5 Creating cluster in Docker fails at some point
Considering this a problem classification. Tried to initialize dev cluster on Macos (Sonoma 14.4.1), Docker desktop v4.30.0 as documentation suggests with a higher verbosity level:
eksctl anywhere create cluster -f $CLUSTER_NAME.yaml -v 9
Using latest release as in:
Initializing long running container {"name": "eksa_1715243480381280000", "image": "public.ecr.aws/eks-anywhere/cli-tools:v0.19.5-eks-a-65"}
Initialization goes well, containers for control-plane, lb, etcd, .. are successfully created. But creation process then stops at this point:
24-05-09T10:52:50.466+0200 V1 cleaning up temporary namespace for diagnostic collectors {"namespace": "eksa-diagnostics"}
2024-05-09T10:52:50.466+0200 V5 Retrier: {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2024-05-09T10:52:50.466+0200 V6 Executing command {"cmd": "/usr/local/bin/docker exec -i eksa_1715244428714146000 kubectl delete namespace eksa-diagnostics --kubeconfig mgmt/mgmt-eks-a-cluster.kubeconfig"}
2024-05-09T10:52:55.641+0200 V5 Retry execution successful {"retries": 1, "duration": "5.175007875s"}
2024-05-09T10:52:55.642+0200 V4 Task finished {"task_name": "collect-cluster-diagnostics", "duration": "17.227805209s"}
2024-05-09T10:52:55.642+0200 V4 ----------------------------------
2024-05-09T10:52:55.642+0200 V4 Saving checkpoint {"file": "mgmt-checkpoint.yaml"}
2024-05-09T10:52:55.643+0200 V4 Tasks completed {"duration": "5m38.393764542s"}
2024-05-09T10:52:55.643+0200 V3 Cleaning up long running container {"name": "eksa_1715244428714146000"}
2024-05-09T10:52:55.643+0200 V6 Executing command {"cmd": "/usr/local/bin/docker rm -f -v eksa_1715244428714146000"}
Error: creating namespace eksa-system: The connection to the server localhost:8080 was refused - did you specify the right host or port?
To me, it looks like that temporary container is rm too early and script does not handle the missing kubeconfig then.
So, questions - is this considered a bug, is it possible to workaround quickly somehow and is there a possibility to continue the cluster creation procedure from the failing point?