node-feature-discovery helm: better support parallel deployments

What would you like to be added:

Spun out from https://github.com/kubernetes-sigs/node-feature-discovery/pull/640#issuecomment-968687743:

Other Helm apps may also want to add NFD as a dependency. Actually, custom/local NFD sources look like they were designed for that reason. What needs to be done here is to make sure that there is a configuration for adding NFD as a Helm chart dependency in a way that it won't overlap with other NFD Helm chart dependencies and/or other NFD instances that may be present in the cluster (e.g. the volume mounts that configure the local source is an example of such an overlap, and this PR is another).

We should properly support multiple parallel Helm deployments. That is, sufficiently isolate them to avoid races/clashes

Why is this needed:

Helm makes it possible to have multiple parallel deployments and we should try our best to support this

Nov 16 '21 11:11 marquiz

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Feb 14 '22 12:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Mar 16 '22 13:03 k8s-triage-robot

Still a valid issue, helping hands would be welcome /remove-lifecycle rotten

Mar 16 '22 16:03 marquiz

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jun 14 '22 17:06 k8s-triage-robot

#831 certainly is one step in solving this issue.

@jasine do you have any comments on this issue? Have you tried multiple parallel Helm-based NFD deployments?

/remove-lifecycle stale

Jul 08 '22 12:07 marquiz

#831 certainly is one step in solving this issue.

@jasine do you have any comments on this issue? Have you tried multiple parallel Helm-based NFD deployments?

/remove-lifecycle stale

@marquiz I just tried to make a second deployment on cluster and failed with blowing error

Error: INSTALLATION FAILED: rendered manifests contain a resource that already exists. Unable to continue with install: CustomResourceDefinition "nodefeaturerules.nfd.k8s-sigs.io" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-name" must equal "nfd-2": current value is "nfd-1"

and the reason is crd nodefeaturerules.nfd.k8s-sigs.io placed under manifests folder, while helm encourage crds placed at crds folder

when I renamed manifests to crds, multiple parallel Helm-based NFD deployed succeed.

Jul 09 '22 05:07 jasine

I think that, as of NFD v0.11.1, there is no "real" need for parallel deployments. Even custom feature sources can be applied with a simple NodeFeatureRule. A cluster-scoped NFD deployment is just enough to address the needs of every app that either uses the out-of-the-box feature labels, dynamically specifies new features using the local filesystem, or registers custom rules through the operator.

Having said that, I wouldn't suggest using NFD as a direct Helm dependency, but instead as a requirement in a higher layer (e.g. Helmfile).

Jul 11 '22 09:07 eliaskoromilas

I think that, as of NFD v0.11.1, there is no "real" need for parallel deployments. Even custom feature sources can be applied with a simple NodeFeatureRule. A cluster-scoped NFD deployment is just enough to address the needs of every app that either uses the out-of-the-box feature labels, dynamically specifies new features using the local filesystem, or registers custom rules through the operator.

Yeah, I fully agree on this 👍 We're really trying to make it unnecessary to have multiple parallel NFD deployments. The ´NodeFeatureRule` already causes some fuss with parallel deployments as it's a cluster-scoped (non-namespaced) resource and only the default instance (by default) is processing those resources. #828 will complicate matters further, probably making parallel NFD installations unsupported when/if gRPC communication is dropped from NFD. So, I think I'm not going to invest my time on this issue anymore.

Having said that, I'm still open to contributions if somebody wants to work on this. Currently I see two shortcomings in parallel installs: CRDs (thanks @jasine) and the hostPath mounts for hooks and feature files (/etc/kubernetes/node-feature-discovery/{source.d/ | features.d/}). CRD would be an easy fix, just rename. For mounts we could think about "namespacing" the host dirs e.g. /etc/kubernetes/node-feature-discovery/source.d.{INSTANCE}/

Aug 09 '22 06:08 marquiz

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Nov 07 '22 07:11 k8s-triage-robot

/remove-lifecycle stale

Nov 14 '22 07:11 vaibhav2107

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Feb 12 '23 08:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Mar 14 '23 08:03 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Apr 13 '23 09:04 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Apr 13 '23 09:04 k8s-ci-robot

node-feature-discovery node-feature-discovery copied to clipboard

helm: better support parallel deployments

node-feature-discovery
node-feature-discovery copied to clipboard