node-feature-discovery icon indicating copy to clipboard operation
node-feature-discovery copied to clipboard

helm: better support parallel deployments

Open marquiz opened this issue 3 years ago • 8 comments

What would you like to be added:

Spun out from https://github.com/kubernetes-sigs/node-feature-discovery/pull/640#issuecomment-968687743:

Other Helm apps may also want to add NFD as a dependency. Actually, custom/local NFD sources look like they were designed for that reason. What needs to be done here is to make sure that there is a configuration for adding NFD as a Helm chart dependency in a way that it won't overlap with other NFD Helm chart dependencies and/or other NFD instances that may be present in the cluster (e.g. the volume mounts that configure the local source is an example of such an overlap, and this PR is another).

We should properly support multiple parallel Helm deployments. That is, sufficiently isolate them to avoid races/clashes

Why is this needed:

Helm makes it possible to have multiple parallel deployments and we should try our best to support this

marquiz avatar Nov 16 '21 11:11 marquiz

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Feb 14 '22 12:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Mar 16 '22 13:03 k8s-triage-robot

Still a valid issue, helping hands would be welcome /remove-lifecycle rotten

marquiz avatar Mar 16 '22 16:03 marquiz

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 14 '22 17:06 k8s-triage-robot

#831 certainly is one step in solving this issue.

@jasine do you have any comments on this issue? Have you tried multiple parallel Helm-based NFD deployments?

/remove-lifecycle stale

marquiz avatar Jul 08 '22 12:07 marquiz

#831 certainly is one step in solving this issue.

@jasine do you have any comments on this issue? Have you tried multiple parallel Helm-based NFD deployments?

/remove-lifecycle stale

@marquiz I just tried to make a second deployment on cluster and failed with blowing error

Error: INSTALLATION FAILED: rendered manifests contain a resource that already exists. Unable to continue with install: CustomResourceDefinition "nodefeaturerules.nfd.k8s-sigs.io" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-name" must equal "nfd-2": current value is "nfd-1"

and the reason is crd nodefeaturerules.nfd.k8s-sigs.io placed under manifests folder, while helm encourage crds placed at crds folder

when I renamed manifests to crds, multiple parallel Helm-based NFD deployed succeed.

jasine avatar Jul 09 '22 05:07 jasine

I think that, as of NFD v0.11.1, there is no "real" need for parallel deployments. Even custom feature sources can be applied with a simple NodeFeatureRule. A cluster-scoped NFD deployment is just enough to address the needs of every app that either uses the out-of-the-box feature labels, dynamically specifies new features using the local filesystem, or registers custom rules through the operator.

Having said that, I wouldn't suggest using NFD as a direct Helm dependency, but instead as a requirement in a higher layer (e.g. Helmfile).

eliaskoromilas avatar Jul 11 '22 09:07 eliaskoromilas

I think that, as of NFD v0.11.1, there is no "real" need for parallel deployments. Even custom feature sources can be applied with a simple NodeFeatureRule. A cluster-scoped NFD deployment is just enough to address the needs of every app that either uses the out-of-the-box feature labels, dynamically specifies new features using the local filesystem, or registers custom rules through the operator.

Yeah, I fully agree on this 👍 We're really trying to make it unnecessary to have multiple parallel NFD deployments. The ´NodeFeatureRule` already causes some fuss with parallel deployments as it's a cluster-scoped (non-namespaced) resource and only the default instance (by default) is processing those resources. #828 will complicate matters further, probably making parallel NFD installations unsupported when/if gRPC communication is dropped from NFD. So, I think I'm not going to invest my time on this issue anymore.

Having said that, I'm still open to contributions if somebody wants to work on this. Currently I see two shortcomings in parallel installs: CRDs (thanks @jasine) and the hostPath mounts for hooks and feature files (/etc/kubernetes/node-feature-discovery/{source.d/ | features.d/}). CRD would be an easy fix, just rename. For mounts we could think about "namespacing" the host dirs e.g. /etc/kubernetes/node-feature-discovery/source.d.{INSTANCE}/

marquiz avatar Aug 09 '22 06:08 marquiz

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 07 '22 07:11 k8s-triage-robot

/remove-lifecycle stale

vaibhav2107 avatar Nov 14 '22 07:11 vaibhav2107

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Feb 12 '23 08:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Mar 14 '23 08:03 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Apr 13 '23 09:04 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 13 '23 09:04 k8s-ci-robot