bottlerocket icon indicating copy to clipboard operation
bottlerocket copied to clipboard

NodeCreationFailure for Bottlerocket v1.9

Open KevinCiz opened this issue 3 years ago • 4 comments

Image I'm using: amazon/bottlerocket-aws-k8s-1.23-x86_64-v1.9.2-b8074d44

What I expected to happen: Node will join to nodegroup

What actually happened: Node failed to join to the eks node group but can be found and is healthy in ec2

**How to reproduce the problem: Apply v1.9.x version of bottle rockets in the launch template and attach it to the node group our Bottlerocket user data is as below. `[settings.kubernetes] cluster-name = "my-cluster" api-server = "xxx.eks.amazonaws.com" cluster-certificate = "cert" max-pods = 30 [settings.kubernetes.node-labels] "kubernetes.io/cluster/my-cluster"="owned" [settings.kubernetes.node-taints]

[settings.host-containers.admin] enabled = false superpowered = false ` **

KevinCiz avatar Sep 19 '22 09:09 KevinCiz

@KevinCiz, thanks for reaching out! Can you try setting kubernetes.io/cluster/my-cluster to owned in the resource tags (instances) of your launch template and remove the following from your user-data:

[settings.kubernetes.node-labels]
"kubernetes.io/cluster/my-cluster"="owned"
[settings.kubernetes.node-taints]

If it still isn't connecting, I would try to see if you see any kubelet errors inside of the admin container (which will need to be superpowered).

jpculp avatar Sep 19 '22 20:09 jpculp

@jpculp thanks for the reply, i've removed the kubernetes.io/cluster/my-cluster on my user data and add it into the resource tags in my launch template but im still getting the same error instances failed to join the kubernetes cluster.

Do u have any guide on how to check the kubelet errors from the admin container? i'm unable to find this path in my admin container /var/lib/kubelet i tried to check the amazon log via ssm but im getting permission denied when trying to access /var/log/amazon/ (this path is not available in admin container)

KevinCiz avatar Sep 21 '22 02:09 KevinCiz

@KevinCiz - from the admin container you can run sudo sheltie to drop into a shell on the host and access those paths.

(This only works with superpowered = true since the sheltie wrapper relies on being in the host PID namespace.)

bcressey avatar Sep 21 '22 03:09 bcressey

apiclient get is another useful troubleshooting command that will dump all the configured settings, to see if something isn't being set on launch as expected.

bcressey avatar Sep 21 '22 03:09 bcressey

Is this still an issue? Were you able to get this to join @KevinCiz?

stmcginnis avatar Apr 06 '23 20:04 stmcginnis

@stmcginnis Hey yeap this was resolved. Basically it got nothing to do with the bottlerocketos itself but its related to security group. Becz we're in a restricted network so our nodegroup's security group arent able to reach out to the internet. I found that temporary setting outbound to 0.0.0.0/0 will successfully update bottlerocket os and successfully attach to the nodegroup.

KevinCiz avatar Apr 06 '23 20:04 KevinCiz