charts icon indicating copy to clipboard operation
charts copied to clipboard

[dev-v2.9] Use Upstream Windows Node Exporter In Rancher Monitoring

Open HarrisonWAffel opened this issue 1 year ago • 18 comments

Issue: https://github.com/rancher/windows/issues/223, https://github.com/rancher/windows/issues/234

Problem

Historically Rancher has implemented a custom chart for the Windows node exporter component of Monitoring V2. This was due to a dependency on rancher-wins, which was required in order to start the node exporter process on the host and scrape metrics from WMI. Unfortunately, this custom chart not only results in additional maintenance burdens, the reliance on rancher-wins prevents the windows node exporter from being deployed in certain environments (such as imported clusters, hosted providers, EKS, AKS, etc.).

Solution

With the introduction of host process pods on windows we can now deploy processes directly on the host without the need of rancher-wins. Additionally, upstream has updated their chart which can now be used directly by Rancher. This PR completely removes the old windows exporter sub-chart in favor of pulling in the upstream chart and modifying it as needed. This not only eliminates the use of rancher-wins, allowing the node exporter to be deployed in all environments, but also reduces the maintenance burden of the sub-chart.

This PR makes minor changes to the upstream chart to ensure that configuration options and settings continue to be applied in the same way that they are in older chart versions. Additionally, a dedicated Powershell script has been added which automatically configures the require firewall rules (upstream uses a simple init container, but in practice this errors out after uninstall / reinstall of the chart).

Testing

I've deployed this chart onto custom clusters created on Azure, as well as AKS and EKS. In all tests the chart properly deploys and I can create Grafana charts detailing windows node metrics. I am able to upgrade / downgrade / reinstall the chart without issue.

Engineering Testing

Manual Testing

see above

Automated Testing

n/a

QA Testing Considerations

We will want to test this change specifically in hosted providers such as EKS, AKS, and GKE.

Regressions Considerations

The intention of this PR is to persist all of the settings custom configuration of the existing custom chart onto the upstream chart. However, there are likely some behavior changes between these two charts - this is simply due to the fact that the existing custom chart has not been updated for ~1 year (or more). Testing should be done to identify any gaps or missing features that result from this change, so that we can either address the issue or document their removal from the upstream chart / exporter binary.

Backporting considerations

This change will not be backported to 2.7 or 2.8. We will continue to maintain the custom chart for those release lines, but will only use the upstream chart for 2.9+.

HarrisonWAffel avatar Mar 11 '24 20:03 HarrisonWAffel

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

github-actions[bot] avatar Mar 11 '24 20:03 github-actions[bot]

No need to flag the PR as a draft in the title. Just open it as a draft and all will be fine

lucasmlp avatar Mar 15 '24 21:03 lucasmlp

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

github-actions[bot] avatar Apr 09 '24 18:04 github-actions[bot]

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

github-actions[bot] avatar Apr 11 '24 19:04 github-actions[bot]

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

github-actions[bot] avatar Apr 11 '24 19:04 github-actions[bot]

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

github-actions[bot] avatar Apr 11 '24 19:04 github-actions[bot]

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

github-actions[bot] avatar Apr 11 '24 21:04 github-actions[bot]

This PR is finally ready for review! After looking at release/v2.9 I could not find a "released" 104 version of Rancher monitoring, so I've updated this PR to use version 104.0.0-rc2. Let me know if I should change that to another version or not

HarrisonWAffel avatar Apr 11 '24 21:04 HarrisonWAffel

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

github-actions[bot] avatar Apr 15 '24 14:04 github-actions[bot]

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

github-actions[bot] avatar Apr 15 '24 18:04 github-actions[bot]

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

github-actions[bot] avatar Apr 18 '24 17:04 github-actions[bot]

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

github-actions[bot] avatar Apr 18 '24 18:04 github-actions[bot]

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

github-actions[bot] avatar Apr 18 '24 18:04 github-actions[bot]

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

github-actions[bot] avatar Apr 18 '24 19:04 github-actions[bot]

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

github-actions[bot] avatar Apr 18 '24 19:04 github-actions[bot]

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

github-actions[bot] avatar Apr 23 '24 20:04 github-actions[bot]

@adamkpickering Good catch! forgot I had done the same versioning change for the sub-chart as well. Should be fixed now

HarrisonWAffel avatar Apr 23 '24 20:04 HarrisonWAffel

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

github-actions[bot] avatar Apr 30 '24 19:04 github-actions[bot]

@joshmeranda , could you please re-review?

@HarrisonWAffel , looks like there are merge conflicts including in tgz file so you may need to bump version?

snasovich avatar May 13 '24 20:05 snasovich

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

github-actions[bot] avatar May 20 '24 16:05 github-actions[bot]