amazon-eks-ami
amazon-eks-ami copied to clipboard
[bug] script allows to pass incorrect kublete arguments
Hey!
Issue
Recently, at the company I work for we had an incident caused by incorrect arguments being passed to the kubelet via --kubelet-extra-args
in the EKS terraform configuration.
These arguments are being passed to the bootstrap.sh
by the Terraform provider, and it seems that the script accepts incorrect arguments but doesn't check later if the kubelet has started.
Nodes with incorrect kubelet
arguments cannot start the kubelet
and thus join the cluster. Despite the above, EKS does not consider such nodes unhealthy.
Proposed solution
Add checks to determine whether the kubelet
has started or not.
Very roughly:
if systemctl is-active --quiet kubelet; then
log "INFO: kubelet service is active and running."
else
log "ERROR: kubelet service failed to start."
exit 1 # Exit if kubelet did not start successfully
fi