karpenter-provider-aws icon indicating copy to clipboard operation
karpenter-provider-aws copied to clipboard

fix: use only evictionHard for allocatable capacity calculation

Open moko-poi opened this issue 2 months ago • 10 comments

Fixes #8407

Description

This PR fixes the incorrect eviction threshold calculation in Karpenter's allocatable memory computation. Previously, Karpenter used max(evictionSoft, evictionHard) to determine how much memory to reserve for eviction thresholds, but this is inconsistent with kubelet behavior.

According to the Kubernetes documentation, only evictionHard thresholds should impact allocatable capacity, while evictionSoft thresholds are warning-only and should not reserve memory.

Changes:

  • Modified evictionThreshold() function in pkg/providers/instancetype/types.go to use only evictionHard values
  • Updated test cases to expect evictionHard values instead of the maximum of both thresholds
  • Added clear comments explaining the change and referencing the Kubernetes specification

Example:

  • Before: evictionSoft: 1Gi, evictionHard: 500Mi → reserves 1Gi (incorrect)
  • After: evictionSoft: 1Gi, evictionHard: 500Mi → reserves 500Mi (matches kubelet)

How was this change tested?

  • Updated existing unit tests to verify the new behavior
  • Modified test expectations for the three affected test cases that were checking max(evictionSoft, evictionHard) logic
  • Verified that the code compiles successfully with the changes
  • All tests now validate that only evictionHard values are used for allocatable capacity calculation

Does this change impact docs?

  • [ ] Yes, PR includes docs updates
  • [ ] Yes, issue opened: #
  • [x] No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

moko-poi avatar Oct 05 '25 14:10 moko-poi

Deploy Preview for karpenter-docs-prod ready!

Name Link
Latest commit 06e5403690530c1efe09285af01467ddf7fc16b9
Latest deploy log https://app.netlify.com/projects/karpenter-docs-prod/deploys/6915cd645ec6060008b399c8
Deploy Preview https://deploy-preview-8565--karpenter-docs-prod.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

netlify[bot] avatar Oct 05 '25 14:10 netlify[bot]

Preview deployment ready!

Preview URL: https://pr-8565.d18coufmbnnaag.amplifyapp.com

Built from commit 0fcc0e574fd17d48e6c744cc2c78aafa4fc05030

github-actions[bot] avatar Oct 07 '25 23:10 github-actions[bot]

@DerekFrank 7058ca5 Thank you for reviewing! I've addressed the feedback, so could you please trigger the CI run?

moko-poi avatar Oct 11 '25 11:10 moko-poi

Snapshot successfully published to oci://021119463062.dkr.ecr.us-east-1.amazonaws.com/karpenter/snapshot/karpenter:0-7058ca5e1e36954bbc6a098f9142643945a98d48. To install you must login to the ECR repo with an AWS account:

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 021119463062.dkr.ecr.us-east-1.amazonaws.com

helm upgrade --install karpenter oci://021119463062.dkr.ecr.us-east-1.amazonaws.com/karpenter/snapshot/karpenter --version "0-7058ca5e1e36954bbc6a098f9142643945a98d48" --namespace "kube-system" --create-namespace \
  --set "settings.clusterName=${CLUSTER_NAME}" \
  --set "settings.interruptionQueue=${CLUSTER_NAME}" \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi \
  --wait

github-actions[bot] avatar Oct 14 '25 17:10 github-actions[bot]

@moko-poi Looks like there are still CI issues. You should be able to run them locally: https://github.com/aws/karpenter-provider-aws/blob/main/Makefile#L52

DerekFrank avatar Oct 14 '25 18:10 DerekFrank

@DerekFrank Thanks for the reminder. I've already run the tests locally and pushed the fixes. The CI should pass now. Please let me know if there are any remaining issues. 0fcc0e5

❯ make ci-test                                                                                                                                                                              ─╯
go test ./pkg/... \
		-cover -coverprofile=coverage.out -outputdir=. -coverpkg=./... \
		--ginkgo.focus="" \
		--ginkgo.randomize-all \
		--ginkgo.vv
?   	github.com/aws/karpenter-provider-aws/pkg/apis	[no test files]
ok  	github.com/aws/karpenter-provider-aws/pkg/apis/v1	14.585s	coverage: 4.1% of statements in ./...
?   	github.com/aws/karpenter-provider-aws/pkg/aws	[no test files]
ok  	github.com/aws/karpenter-provider-aws/pkg/batcher	13.118s	coverage: 6.2% of statements in ./...
ok  	github.com/aws/karpenter-provider-aws/pkg/cache	0.328s	coverage: 0.7% of statements in ./...
ok  	github.com/aws/karpenter-provider-aws/pkg/cloudprovider	47.276s	coverage: 49.5% of statements in ./...
	github.com/aws/karpenter-provider-aws/pkg/cloudprovider/events		coverage: 0.0% of statements
	github.com/aws/karpenter-provider-aws/pkg/controllers		coverage: 0.0% of statements
ok  	github.com/aws/karpenter-provider-aws/pkg/controllers/capacityreservation/capacitytype	7.752s	coverage: 6.9% of statements in ./...
ok  	github.com/aws/karpenter-provider-aws/pkg/controllers/capacityreservation/expiration	7.463s	coverage: 8.3% of statements in ./...
ok  	github.com/aws/karpenter-provider-aws/pkg/controllers/interruption	9.282s	coverage: 9.3% of statements in ./...
	github.com/aws/karpenter-provider-aws/pkg/controllers/interruption/events		coverage: 0.0% of statements
	github.com/aws/karpenter-provider-aws/pkg/controllers/interruption/messages		coverage: 0.0% of statements
	github.com/aws/karpenter-provider-aws/pkg/controllers/interruption/messages/noop		coverage: 0.0% of statements
	github.com/aws/karpenter-provider-aws/pkg/controllers/interruption/messages/rebalancerecommendation		coverage: 0.0% of statements
	github.com/aws/karpenter-provider-aws/pkg/controllers/interruption/messages/scheduledchange		coverage: 0.0% of statements
	github.com/aws/karpenter-provider-aws/pkg/controllers/interruption/messages/spotinterruption		coverage: 0.0% of statements
	github.com/aws/karpenter-provider-aws/pkg/controllers/interruption/messages/statechange		coverage: 0.0% of statements
ok  	github.com/aws/karpenter-provider-aws/pkg/controllers/metrics	8.872s	coverage: 19.2% of statements in ./...
ok  	github.com/aws/karpenter-provider-aws/pkg/controllers/nodeclaim/garbagecollection	9.603s	coverage: 12.3% of statements in ./...
ok  	github.com/aws/karpenter-provider-aws/pkg/controllers/nodeclaim/tagging	14.283s	coverage: 10.0% of statements in ./...
ok  	github.com/aws/karpenter-provider-aws/pkg/controllers/nodeclass	32.071s	coverage: 38.9% of statements in ./...
ok  	github.com/aws/karpenter-provider-aws/pkg/controllers/nodeclass/garbagecollection	7.968s	coverage: 11.5% of statements in ./...
ok  	github.com/aws/karpenter-provider-aws/pkg/controllers/nodeclass/hash	8.103s	coverage: 9.1% of statements in ./...
ok  	github.com/aws/karpenter-provider-aws/pkg/controllers/providers/instancetype	4.627s	coverage: 11.2% of statements in ./...
ok  	github.com/aws/karpenter-provider-aws/pkg/controllers/providers/instancetype/capacity	7.538s	coverage: 14.6% of statements in ./...
ok  	github.com/aws/karpenter-provider-aws/pkg/controllers/providers/pricing	4.622s	coverage: 9.7% of statements in ./...
ok  	github.com/aws/karpenter-provider-aws/pkg/controllers/providers/ssm/invalidation	4.284s	coverage: 9.5% of statements in ./...
ok  	github.com/aws/karpenter-provider-aws/pkg/controllers/providers/version	4.650s	coverage: 6.8% of statements in ./...
	github.com/aws/karpenter-provider-aws/pkg/errors		coverage: 0.0% of statements
	github.com/aws/karpenter-provider-aws/pkg/fake		coverage: 0.0% of statements
ok  	github.com/aws/karpenter-provider-aws/pkg/operator	5.748s	coverage: 1.8% of statements in ./...
ok  	github.com/aws/karpenter-provider-aws/pkg/operator/options	0.376s	coverage: 1.1% of statements in ./...
ok  	github.com/aws/karpenter-provider-aws/pkg/providers/amifamily	5.106s	coverage: 12.6% of statements in ./...
ok  	github.com/aws/karpenter-provider-aws/pkg/providers/amifamily/bootstrap	0.887s	coverage: 0.2% of statements in ./...
ok  	github.com/aws/karpenter-provider-aws/pkg/providers/amifamily/bootstrap/mime	0.613s	coverage: 0.7% of statements in ./...
ok  	github.com/aws/karpenter-provider-aws/pkg/providers/capacityreservation	4.594s	coverage: 6.6% of statements in ./...
ok  	github.com/aws/karpenter-provider-aws/pkg/providers/instance	8.176s	coverage: 31.5% of statements in ./...
ok  	github.com/aws/karpenter-provider-aws/pkg/providers/instance/filter	1.293s	coverage: 2.8% of statements in ./...
ok  	github.com/aws/karpenter-provider-aws/pkg/providers/instanceprofile	8.839s	coverage: 10.2% of statements in ./...
ok  	github.com/aws/karpenter-provider-aws/pkg/providers/instancetype	20.108s	coverage: 37.3% of statements in ./...
	github.com/aws/karpenter-provider-aws/pkg/providers/instancetype/offering		coverage: 0.0% of statements
ok  	github.com/aws/karpenter-provider-aws/pkg/providers/launchtemplate	171.026s	coverage: 48.7% of statements in ./...
	github.com/aws/karpenter-provider-aws/pkg/providers/pricing		coverage: 0.0% of statements
ok  	github.com/aws/karpenter-provider-aws/pkg/providers/securitygroup	4.700s	coverage: 8.6% of statements in ./...
	github.com/aws/karpenter-provider-aws/pkg/providers/sqs		coverage: 0.0% of statements
	github.com/aws/karpenter-provider-aws/pkg/providers/ssm		coverage: 0.0% of statements
ok  	github.com/aws/karpenter-provider-aws/pkg/providers/subnet	5.008s	coverage: 8.4% of statements in ./...
ok  	github.com/aws/karpenter-provider-aws/pkg/providers/version	5.906s	coverage: 5.9% of statements in ./...
	github.com/aws/karpenter-provider-aws/pkg/test		coverage: 0.0% of statements
	github.com/aws/karpenter-provider-aws/pkg/utils		coverage: 0.0% of statements
go tool cover -html coverage.out -o coverage.html

moko-poi avatar Nov 04 '25 16:11 moko-poi

Pull Request Test Coverage Report for Build 19075272556

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 9 of 9 (100.0%) changed or added relevant lines in 1 file are covered.
  • 63 unchanged lines in 9 files lost coverage.
  • Overall coverage increased (+0.02%) to 67.456%

Files with Coverage Reduction New Missed Lines %
pkg/providers/amifamily/bootstrap/bottlerocketsettings.go 1 93.55%
pkg/controllers/controllers.go 4 0.0%
pkg/providers/securitygroup/securitygroup.go 4 94.87%
pkg/errors/errors.go 6 50.97%
pkg/providers/amifamily/ami.go 7 94.87%
pkg/controllers/providers/ssm/invalidation/controller.go 8 82.61%
pkg/test/environment.go 8 94.39%
pkg/providers/subnet/subnet.go 12 90.29%
pkg/controllers/nodeclass/controller.go 13 69.05%
<!-- Total: 63
Totals Coverage Status
Change from base Build 18208022291: 0.02%
Covered Lines: 7775
Relevant Lines: 11526

💛 - Coveralls

coveralls avatar Nov 06 '25 22:11 coveralls

@moko-poi Could you rebase so we can get this merged?

DerekFrank avatar Nov 12 '25 20:11 DerekFrank

@DerekFrank Rebased! Ready for review.

moko-poi avatar Nov 13 '25 12:11 moko-poi

@DerekFrank @grosser @sumukha-radhakrishna Hi! Just wanted to gently follow up on this PR. Would appreciate any feedback when you have a chance. Thanks!

moko-poi avatar Nov 28 '25 03:11 moko-poi