karpenter icon indicating copy to clipboard operation
karpenter copied to clipboard

feat: support unknown resources

Open universam1 opened this issue 2 years ago • 7 comments

Description

Karpenter cannot be used on clusters where custom resources for pods are defined, such as device drivers like /dev/fuse used with Podman and many more (see references).

Following error is logged:

karpenter-778b9dbc4f-gk88t {"level":"ERROR",..."logger":"controller.provisioner","message":"Could not schedule pod, incompatible with provisioner \"default\", daemonset overhead={\"cpu\":\"562m\",\"memory\":\"758026799\",\"pods\":\"10\"}, no instance type satisfied resources {\"cpu\":\"1562m\",\"memory\":\"1831768623\",\"pods\":\"11\",\"smarter-devices/fuse\":\"1\"} and requirements karpenter.k8s.aws/instance-category In [c m r], karpenter.k8s.aws/instance-generation Exists >2, karpenter.k8s.aws/instance-hypervisor In [nitro], karpenter.k8s.aws/instance-size NotIn [medium micro nano small], karpenter.sh/capacity-type In [on-demand spot], karpenter.sh/provisioner-name In [default], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux], node.kubernetes.io/node-group In [primary] (no instance type has enough resources)"}

Here we add a flag to instruct Karpenter to ignore certain defined resources, which will allow the usage of Karpenter for these clusters.

apiVersion: v1
kind: ConfigMap
metadata:
  name: karpenter-global-settings
data:
  ignoredDeviceRequests: "smarter-devices/fuse,some-other-device"

Fixes https://github.com/aws/karpenter-core/issues/751 Fixes https://github.com/aws/karpenter/issues/2390 Fixes https://github.com/aws/karpenter/issues/2899 Fixes https://github.com/navvis-dev/karpenter/pull/3 Fixes https://github.com/aws/karpenter/issues/3535 Fixes https://gitlab.com/gitlab-org/gitlab-runner/-/merge_requests/3717 Fixes https://github.com/aws/karpenter-core/issues/308 Fixes https://github.com/aws/karpenter/issues/3315 Fixes https://github.com/aws/karpenter/issues/3693

How was this change tested?

This fork is run in dozen of production clusters.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

universam1 avatar Oct 13 '23 08:10 universam1

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

github-actions[bot] avatar Oct 27 '23 12:10 github-actions[bot]

bump

universam1 avatar Oct 27 '23 14:10 universam1

Hey @universam1, we've deprecated the configmap as part of the v1beta1 APIs. Due to that, we won't be accepting any changes to the ConfigMap. More details here https://karpenter.sh/docs/upgrading/v1beta1-migration/

As another point, it looks like there are CI failures, and this is a fairly complex problem that warrants a design. Can you come to working group or kubernetes/karpenter-dev to discuss?

njtran avatar Oct 31 '23 22:10 njtran

Unknown CLA label state. Rechecking for CLA labels.

Send feedback to sig-contributor-experience at kubernetes/community.

/check-cla /easycla

k8s-triage-robot avatar Jan 19 '24 17:01 k8s-triage-robot

CLA Not Signed

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: universam1 Once this PR has been reviewed and has the lgtm label, please assign jonathan-innis for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Feb 28 '24 13:02 k8s-ci-robot

Pull Request Test Coverage Report for Build 8092652375

Details

  • 1 of 2 (50.0%) changed or added relevant lines in 1 file are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.01%) to 80.97%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/utils/resources/resources.go 1 2 50.0%
<!-- Total: 1 2
Files with Coverage Reduction New Missed Lines %
pkg/controllers/disruption/expiration.go 2 90.91%
<!-- Total: 2
Totals Coverage Status
Change from base Build 8060854800: 0.01%
Covered Lines: 8178
Relevant Lines: 10100

💛 - Coveralls

coveralls avatar Feb 29 '24 08:02 coveralls

Consider looking at #1305! This is our first iteration at solving this problem more comprehensively!

jonathan-innis avatar Jun 11 '24 07:06 jonathan-innis

closing in favor of #1305 Thank you @jonathan-innis for the effort!

universam1 avatar Jun 11 '24 07:06 universam1