bottlerocket icon indicating copy to clipboard operation
bottlerocket copied to clipboard

Support Warm Pool for EC2 Auto Scaling!

Open samjo-nyang opened this issue 4 years ago • 5 comments

What I'd like: It is good to Bottlerocket support warm pool.

For now, kubelets that in warm pools instances are started so they are registered in the control plane. And then, the instances are stopped so the nodes are falling into "NotReady" state.

Therefore, I think the Bottlerocket instances should (optionally) read their current status, and postpone the kubelet up. (such as https://github.com/kubernetes/kops/pull/11216) Any alternatives you've considered: (none)

samjo-nyang avatar Oct 27 '21 10:10 samjo-nyang

Hi @samjo-nyang thanks for sharing this! It is an interesting feature, we will check how to support this down the road!

arnaldo2792 avatar Oct 27 '21 17:10 arnaldo2792

For your information, it's kind of cheating... but it works.

 cmd/kubelet/app/server.go | 47 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/cmd/kubelet/app/server.go b/cmd/kubelet/app/server.go
index ff7e05feec1..4ea90d8ec39 100644
--- a/cmd/kubelet/app/server.go
+++ b/cmd/kubelet/app/server.go
@@ -22,6 +22,10 @@ import (
 	"crypto/tls"
 	"errors"
 	"fmt"
+	"github.com/aws/aws-sdk-go/aws"
+	"github.com/aws/aws-sdk-go/aws/ec2metadata"
+	"github.com/aws/aws-sdk-go/aws/session"
+	awsautoscaling "github.com/aws/aws-sdk-go/service/autoscaling"
 	"math"
 	"net"
 	"net/http"
@@ -474,7 +478,50 @@ func makeEventRecorder(kubeDeps *kubelet.Dependencies, nodeName types.NodeName)
 	}
 }
 
+func isInWarmEC2() bool {
+	sess, err := session.NewSession()
+	if err != nil {
+		return false
+	}
+	svc := ec2metadata.New(sess)
+	if !svc.Available() {
+		return false
+	}
+	doc, err := svc.GetInstanceIdentityDocument()
+	if err != nil {
+		return false
+	}
+	instanceId := doc.InstanceID
+	region := doc.Region
+
+	asgClient := awsautoscaling.New(sess, &aws.Config{Region: aws.String(region)})
+	resp, err := asgClient.DescribeAutoScalingInstances(&awsautoscaling.DescribeAutoScalingInstancesInput{
+		InstanceIds: []*string{aws.String(instanceId)},
+	})
+	if err != nil {
+		return false
+	}
+	if len(resp.AutoScalingInstances) != 1 {
+		return false
+	}
+	instance := resp.AutoScalingInstances[0]
+	if instance != nil && instance.LifecycleState != nil && strings.Contains(*instance.LifecycleState, "Warmed") {
+		return true
+	}
+	return false
+}
+
 func run(ctx context.Context, s *options.KubeletServer, kubeDeps *kubelet.Dependencies, featureGate featuregate.FeatureGate) (err error) {
+	if isInWarmEC2() {
+		go daemon.SdNotify(false, "READY=1")
+
+		select {
+		case <-ctx.Done():
+			break
+		}
+		return nil
+	}
+
 	// Set global feature gates based on the value on the initial KubeletServer
 	err = utilfeature.DefaultMutableFeatureGate.SetFromMap(s.KubeletConfiguration.FeatureGates)
 	if err != nil {

samjo-nyang avatar Mar 04 '22 00:03 samjo-nyang

ECS also supports auto-scaling warm pools now, so we should try to implement this in an orchestrator-agnostic way.

bcressey avatar Mar 28 '22 17:03 bcressey

To properly handle ECS AutoScaling warm pools, Bottlerocket should support ecs-agent parameter: ECS_WARM_POOLS_CHECK

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/asg-capacity-providers-create-auto-scaling-group.html#using-warm-pool

mello7tre avatar Apr 07 '22 13:04 mello7tre

Also, I have an additional feature request - I want to pull images in the warm pool state before instances go to hibernated or stopped state. Currently, I use a host container to do this, but I want something Bottlerocket-supported, such as

[settings.kubernetes.warm-pool]
enabled = true
prefetch_images = ["aaa", "bbb"]

FYI: the entrypoint shell script of the container that I used -

#!/bin/bash

echo "Start Image Puller"

INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
REGION=$(curl -s http://169.254.169.254/latest/meta-data/placement/region)
ASG_NAME=$(aws autoscaling describe-auto-scaling-instances --region $REGION --instance-id $INSTANCE_ID | jq -r ".AutoScalingInstances[].AutoScalingGroupName")
echo "Instance Info: $INSTANCE_ID, $REGION, $ASG_NAME"

export CONTAINER_RUNTIME_ENDPOINT="unix:///.bottlerocket/rootfs/run/dockershim.sock"
export IMAGE_SERVICE_ENDPOINT="unix:///.bottlerocket/rootfs/run/dockershim.sock"
while ! crictl info > /dev/null 2>&1; do
  echo 'Wait Until containerd is Ready'
  sleep 10
done

USER_DATA=$(cat /.bottlerocket/host-containers/current/user-data)
echo $USER_DATA | jq -r -c '.images[]' | while read image; do
  echo "Downloading Image $image"
  crictl pull $image
done

LIFECYCLE_HOOK_NAME=$(echo $USER_DATA | jq -r '.lifecycle_hook_name')
aws autoscaling complete-lifecycle-action \
  --lifecycle-hook-name $LIFECYCLE_HOOK_NAME \
  --auto-scaling-group-name $ASG_NAME \
  --lifecycle-action-result CONTINUE \
  --instance-id $INSTANCE_ID \
  --region $REGION
echo "Done CompleteLifecycleAction; Do Infinite Sleep..."

samjo-nyang avatar Apr 15 '22 01:04 samjo-nyang

We're still working on this feature and now targeting the 1.11.0 release

gthao313 avatar Sep 30 '22 16:09 gthao313