aws-node-termination-handler icon indicating copy to clipboard operation
aws-node-termination-handler copied to clipboard

filter non-ASG nodes by tag

Open universam1 opened this issue 3 years ago • 6 comments

Describe the feature

Karpenter nodes are not bound to a Autoscaling Group. Filter on tags via check-asg-tag-before-draining is scoped to ASG only, so it is not possible to filter nodes such nodes based on tags. https://github.com/aws/aws-node-termination-handler/blob/2048e9263d2b039a22cffd2c34b55e7658b52261/pkg/monitor/sqsevent/sqs-monitor.go#L356-L365

Is the feature request related to a problem?

AFAIK pre - filtering the Spot ITN Events is not possible because EC2 Spot Instance Interruption Warning do not contain tags of the instance where we could filter the rule on

I have several clusters in one account and thus I am missing an option in NTH to filter for cluster respective nodes only.

Describe alternatives you've considered

none

universam1 avatar Jun 17 '22 09:06 universam1

Hi @universam1! Can you help confirm the ask? It sounds like you want the ability for NTH to ignore nodes that are not associated with an ASG, using the same ManagedAsgTag we use today for nodes in ASGs (name might need to be revisited...).

I have to bound NTH to the cluster respective nodes only.

Can you help clarify what you mean, assuming this is your current workaround to the issue?

AustinSiu avatar Jun 22 '22 23:06 AustinSiu

@AustinSiu Thank you for asking - actually the opposite 😄

I want NTH to manage non-ASG nodes that are created by Karpenter. Plus I need to filter based on tags to identify the cluster-nodes. However, ManagedAsgTag filter is not applicable for non-ASG nodes, see above.

assuming this is your current workaround to the issue?

Sorry if I was unclear, I have no workaround for this problem.

universam1 avatar Jun 27 '22 06:06 universam1

👋 @universam1 you are right and this is a regression. We used to have an AssumeASGTagPropagation configuration that would result in no ASG lookup, but that was removed recently. We should add that back.

https://github.com/aws/aws-node-termination-handler/commit/37e989956e163b027cd2dcc04a58ce89a75244c6#r78210273

bwagner5 avatar Jul 11 '22 15:07 bwagner5

I've just encountered a similar issue. I just wanted to post a potential workaround to tag your instance with aws:autoscaling:groupName, but I forgot aws: prefix is reserved and the user can't assign such a tag.

We are looking forward to the solution. @bwagner5 not sure what Priority:Critical means. Can you shed some light on a possible timeframe for a resolution? Not sure if we should switch to IMDS or just wait for the fix, so any estimation (days, weeks, months?) would be very useful.

pangorgo avatar Jul 12 '22 21:07 pangorgo

@pangorgo likely a week or so to get the change in and tested. A current workaround is to rollback to v1.16.2

bwagner5 avatar Jul 12 '22 22:07 bwagner5

Thanks @bwagner5 I've just tested it with version v1.16.2 and it works 🎉 I just had to add ASSUME_ASG_TAG_PROPAGATION=true

pangorgo avatar Jul 12 '22 23:07 pangorgo

This has been released in NTH app version v1.17.0. Helm chart version v0.19.0.

snay2 avatar Aug 18 '22 16:08 snay2