Support Warm Pool for EC2 Auto Scaling!
What I'd like: It is good to Bottlerocket support warm pool.
For now, kubelets that in warm pools instances are started so they are registered in the control plane. And then, the instances are stopped so the nodes are falling into "NotReady" state.
Therefore, I think the Bottlerocket instances should (optionally) read their current status, and postpone the kubelet up. (such as https://github.com/kubernetes/kops/pull/11216) Any alternatives you've considered: (none)
Hi @samjo-nyang thanks for sharing this! It is an interesting feature, we will check how to support this down the road!
For your information, it's kind of cheating... but it works.
cmd/kubelet/app/server.go | 47 +++++++++++++++++++++++++++++++++++++++
1 file changed, 47 insertions(+)
diff --git a/cmd/kubelet/app/server.go b/cmd/kubelet/app/server.go
index ff7e05feec1..4ea90d8ec39 100644
--- a/cmd/kubelet/app/server.go
+++ b/cmd/kubelet/app/server.go
@@ -22,6 +22,10 @@ import (
"crypto/tls"
"errors"
"fmt"
+ "github.com/aws/aws-sdk-go/aws"
+ "github.com/aws/aws-sdk-go/aws/ec2metadata"
+ "github.com/aws/aws-sdk-go/aws/session"
+ awsautoscaling "github.com/aws/aws-sdk-go/service/autoscaling"
"math"
"net"
"net/http"
@@ -474,7 +478,50 @@ func makeEventRecorder(kubeDeps *kubelet.Dependencies, nodeName types.NodeName)
}
}
+func isInWarmEC2() bool {
+ sess, err := session.NewSession()
+ if err != nil {
+ return false
+ }
+ svc := ec2metadata.New(sess)
+ if !svc.Available() {
+ return false
+ }
+ doc, err := svc.GetInstanceIdentityDocument()
+ if err != nil {
+ return false
+ }
+ instanceId := doc.InstanceID
+ region := doc.Region
+
+ asgClient := awsautoscaling.New(sess, &aws.Config{Region: aws.String(region)})
+ resp, err := asgClient.DescribeAutoScalingInstances(&awsautoscaling.DescribeAutoScalingInstancesInput{
+ InstanceIds: []*string{aws.String(instanceId)},
+ })
+ if err != nil {
+ return false
+ }
+ if len(resp.AutoScalingInstances) != 1 {
+ return false
+ }
+ instance := resp.AutoScalingInstances[0]
+ if instance != nil && instance.LifecycleState != nil && strings.Contains(*instance.LifecycleState, "Warmed") {
+ return true
+ }
+ return false
+}
+
func run(ctx context.Context, s *options.KubeletServer, kubeDeps *kubelet.Dependencies, featureGate featuregate.FeatureGate) (err error) {
+ if isInWarmEC2() {
+ go daemon.SdNotify(false, "READY=1")
+
+ select {
+ case <-ctx.Done():
+ break
+ }
+ return nil
+ }
+
// Set global feature gates based on the value on the initial KubeletServer
err = utilfeature.DefaultMutableFeatureGate.SetFromMap(s.KubeletConfiguration.FeatureGates)
if err != nil {
ECS also supports auto-scaling warm pools now, so we should try to implement this in an orchestrator-agnostic way.
To properly handle ECS AutoScaling warm pools, Bottlerocket should support ecs-agent parameter: ECS_WARM_POOLS_CHECK
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/asg-capacity-providers-create-auto-scaling-group.html#using-warm-pool
Also, I have an additional feature request - I want to pull images in the warm pool state before instances go to hibernated or stopped state. Currently, I use a host container to do this, but I want something Bottlerocket-supported, such as
[settings.kubernetes.warm-pool]
enabled = true
prefetch_images = ["aaa", "bbb"]
FYI: the entrypoint shell script of the container that I used -
#!/bin/bash
echo "Start Image Puller"
INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
REGION=$(curl -s http://169.254.169.254/latest/meta-data/placement/region)
ASG_NAME=$(aws autoscaling describe-auto-scaling-instances --region $REGION --instance-id $INSTANCE_ID | jq -r ".AutoScalingInstances[].AutoScalingGroupName")
echo "Instance Info: $INSTANCE_ID, $REGION, $ASG_NAME"
export CONTAINER_RUNTIME_ENDPOINT="unix:///.bottlerocket/rootfs/run/dockershim.sock"
export IMAGE_SERVICE_ENDPOINT="unix:///.bottlerocket/rootfs/run/dockershim.sock"
while ! crictl info > /dev/null 2>&1; do
echo 'Wait Until containerd is Ready'
sleep 10
done
USER_DATA=$(cat /.bottlerocket/host-containers/current/user-data)
echo $USER_DATA | jq -r -c '.images[]' | while read image; do
echo "Downloading Image $image"
crictl pull $image
done
LIFECYCLE_HOOK_NAME=$(echo $USER_DATA | jq -r '.lifecycle_hook_name')
aws autoscaling complete-lifecycle-action \
--lifecycle-hook-name $LIFECYCLE_HOOK_NAME \
--auto-scaling-group-name $ASG_NAME \
--lifecycle-action-result CONTINUE \
--instance-id $INSTANCE_ID \
--region $REGION
echo "Done CompleteLifecycleAction; Do Infinite Sleep..."
We're still working on this feature and now targeting the 1.11.0 release