operator-sdk icon indicating copy to clipboard operation
operator-sdk copied to clipboard

Guide on using node affinity to safely schedule pods in heterogeneous clusters.

Open jaypoulz opened this issue 3 years ago • 5 comments

Description of the change: Added a guide for heterogeneous operators that covers setting nodeAffinity for pod-defining objects so pods aren't scheduled to nodes of incompatible architectures.

Motivation for the change: There is already a guide for creating an operator with support for multiple architectures, but this is not sufficient to cover the nuances of scheduling pods on a cluster with nodes of mixed architecture. This guide leverages the existing sample applications to deep dive into the changes needed for operators of each type (ansible, go, helm).

jaypoulz avatar Aug 10 '22 21:08 jaypoulz

I'd expect to see some discussion of the subscription.spec.config.nodeSelector and subscription.spec.nodeAffinity(new 4.12) and how that can also set nodeselector/affinity values on the operator deployment.

bparees avatar Sep 09 '22 12:09 bparees

@bparees I completely agree that the subscription is an important topic. I need to check with @perdasilva in terms of where those docs should live in full, since I feel like it would probably be closer to the OLM functionality than OperatorSDK. I can absolutely cross-link the docs for that from this guide, though.

jaypoulz avatar Sep 09 '22 16:09 jaypoulz

I feel like it would probably be closer to the OLM functionality than OperatorSDK. I can absolutely cross-link the docs for that from this guide, though.

yup, my original draft of my comment said something like "this may not belong in SDK docs, but....." and then i got lazy and deleted it.

So yeah, this may not be the spot to document it fully, but these docs should reference the functionality since it has implications for how someone's operator may get configured/installed by the admin.

bparees avatar Sep 09 '22 16:09 bparees

So yeah, this may not be the spot to document it fully, but these docs should reference the functionality since it has implications for how someone's operator may get configured/installed by the admin.

I agree, mentioning them here is a good idea.

jmrodri avatar Sep 09 '22 21:09 jmrodri

Besides updates to address the discussion above, I'm working with Camila to get https://github.com/kubernetes-sigs/kubebuilder/pull/2906 nailed down before updating this to the next draft, since much of the same guidance will be present.

jaypoulz avatar Sep 13 '22 19:09 jaypoulz

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot avatar Dec 22 '22 01:12 openshift-bot

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot avatar Jan 22 '23 00:01 openshift-bot

/remove-lifecycle rotten

jaypoulz avatar Jan 23 '23 14:01 jaypoulz

/lifecycle frozen

(just so it stops making it rotten)

everettraven avatar Jan 23 '23 14:01 everettraven

@everettraven: The lifecycle/frozen label cannot be applied to Pull Requests.

In response to this:

/lifecycle frozen

(just so it stops making it rotten)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci[bot] avatar Jan 23 '23 14:01 openshift-ci[bot]

A refresh for this PR is coming soon. I prioritized work on:

  • https://github.com/operator-framework/api/pull/276
  • https://github.com/operator-framework/audit/pull/144

jaypoulz avatar Apr 12 '23 16:04 jaypoulz

/unhold

jaypoulz avatar May 22 '23 18:05 jaypoulz

@jmrodri I think this is ready for another review if you're up for it. :)

jaypoulz avatar May 22 '23 18:05 jaypoulz

/retest

jaypoulz avatar May 30 '23 15:05 jaypoulz

@jaypoulz: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci[bot] avatar May 30 '23 15:05 openshift-ci[bot]

/retest

jaypoulz avatar Jun 10 '23 02:06 jaypoulz

@jaypoulz: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci[bot] avatar Jun 10 '23 02:06 openshift-ci[bot]

I believe this draft is ready for final review. :) @everettraven the only differences between this draft and the one your reviewed is the formatting for the internal links according to the hugo.io standard.

jaypoulz avatar Jun 12 '23 18:06 jaypoulz

Discovered a last inconsistency with the latest rebase, but should be all clear now as tests are passing locally.

jaypoulz avatar Jun 12 '23 19:06 jaypoulz

Thanks guys! :)

jaypoulz avatar Jun 13 '23 23:06 jaypoulz