metrics-server icon indicating copy to clipboard operation
metrics-server copied to clipboard

feat(helm chart): Add metrics-server hardening options

Open mkilchhofer opened this issue 2 years ago • 31 comments

What this PR does / why we need it:

It adds options on howto secure the connection between metrics-server and the kube-apiserver.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

related to:

  • #545
  • #1230

Resolves #1230

Additional info: The accepted FR #1230 is implemented also as I need this to test the hardening feature.

/cc: @serathius

mkilchhofer avatar Jul 14 '23 13:07 mkilchhofer

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: mkilchhofer / name: Marco Maurer (-Kilchhofer) (8e1c090a446801329236e78c00d3787cc0ae062c)

Welcome @mkilchhofer!

It looks like this is your first PR to kubernetes-sigs/metrics-server 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/metrics-server has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

k8s-ci-robot avatar Jul 14 '23 13:07 k8s-ci-robot

Hi @mkilchhofer. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jul 14 '23 13:07 k8s-ci-robot

/assign @stevehipwell

serathius avatar Jul 14 '23 21:07 serathius

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mkilchhofer Once this PR has been reviewed and has the lgtm label, please assign serathius for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Jul 17 '23 06:07 k8s-ci-robot

  • There is no need to run functions for the APIService if it's not enabled

How do you mean that? The genSelfSignedCert function?

  • The helm mode would be a better default (see below)

Sure, will do

  • You don't need to recreate the cert on each Helm install, you can look up the existing secret

I think we should not rely on the lookup function. There are GitOps solutions in the wild which only use helm for rendering (helm template subcommand). The lookup function of helm returns nothing during template as it cannot talk to the kubeapi by design. Only upgrade or install talks to the apiserver.

  • I don't think the chart should support creating a secret, the secret option should only be existing

Hmm why not? It is pretty common that helm charts support this.

  • Please remove the extraObjects value, this isn't a "good" pattern

How can I test the existing secret method then? I think it is an approved feature request, so contributors can file PRs for this, ref:

  • #1230 -> See label triage/accepted

mkilchhofer avatar Jul 17 '23 11:07 mkilchhofer

How do you mean that? The genSelfSignedCert function?

You moved the test for if the APIService was enabled below the new code.

I think we should not rely on the lookup function. There are GitOps solutions in the wild which only use helm for rendering (helm template subcommand). The lookup function of helm returns nothing during template as it cannot talk to the kubeapi by design. Only upgrade or install talks to the apiserver.

In that case they can use one of the other patterns, GitOps shouldn't define architecture it should support best practice.

Hmm why not? It is pretty common that helm charts support this.

Because it's unnecessary and leaks data into the chart values.

How can I test the existing secret method then? I think it is an approved feature request, so contributors can file PRs for this, ref:

We could add a test secret as part or the CI initialisation or it could be generated through the chart as a testing resource.

stevehipwell avatar Jul 17 '23 11:07 stevehipwell

Because it's unnecessary and leaks data into the chart values.

Well, why is that an issue? This is how helm is designed for. These values are stored as a K8s secret.

mkilchhofer avatar Jul 17 '23 11:07 mkilchhofer

Well, why is that an issue? This is how helm is designed for. These values are stored as a K8s secret.

It's the unnecessary part which is the primary concern here; but the Helm implementation and increased surface area for secret information shouldn't be discounted as unimportant.

But as I've said in my other replies you can easily extend this chart by creating a wrapper chart which depends on it and at which point you can make your own architectural decisions.

stevehipwell avatar Jul 17 '23 12:07 stevehipwell

/triage accepted

dashpole avatar Jul 27 '23 16:07 dashpole

@mkilchhofer could you please rebase and then I'll review again?

stevehipwell avatar Jul 28 '23 14:07 stevehipwell

@stevehipwell rebase done

mkilchhofer avatar Jul 28 '23 18:07 mkilchhofer

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Oct 28 '23 03:10 k8s-ci-robot

@mkilchhofer could you rebase this PR?

stevehipwell avatar Dec 01 '23 12:12 stevehipwell

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Feb 29 '24 12:02 k8s-triage-robot

Will try to rebase the PR as soon as I have time

mkilchhofer avatar Feb 29 '24 13:02 mkilchhofer

/remove-lifecycle stale

yangjunmyfm192085 avatar Feb 29 '24 13:02 yangjunmyfm192085

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar May 29 '24 14:05 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jun 28 '24 14:06 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-triage-robot avatar Jul 28 '24 14:07 k8s-triage-robot

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Jul 28 '24 14:07 k8s-ci-robot

/reopen

mkilchhofer avatar Jul 28 '24 21:07 mkilchhofer

@mkilchhofer: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Jul 28 '24 21:07 k8s-ci-robot

@mkilchhofer could you rebase this PR?

@stevehipwell rebase done.

mkilchhofer avatar Jul 29 '24 21:07 mkilchhofer

@mkilchhofer could you add your change to the chart CHANGELOG?

/ok-to-test

@stevehipwell For sure. Done 👍

mkilchhofer avatar Jul 30 '24 19:07 mkilchhofer

@mkilchhofer thanks for this PR, everything looks in order and I can see the CI automation is working. As we only currently check that the pod started correctly this doesn't necessarily mean that the changes actually work, which isn't usually an issue as the actual changes are usually contained in the binary which has been heavily tested. So could you confirm if you have you tested all these configurations locally?

I think that before we release the chart we ought to add a Helm test resource on a hook to confirm that these work and don't regress in the future, but I'm happy to take this PR as is if you've tested locally.

stevehipwell avatar Aug 01 '24 10:08 stevehipwell

So could you confirm if you have you tested all these configurations locally?

Yep, I tested it within KinD (kindest/node:v1.29.2):

Bootstrap a kind cluster

$ pwd
/Users/mkilchhofer/git/github/metrics-server/charts/metrics-server

$ kind create cluster --config ../../test/kind-config.yaml
Creating cluster "kind" ...
(..)

$ kubie ctx kind-kind

$ helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --wait \
  --set installCRDs=true \
  --set extraArgs='{--enable-certificate-owner-ref}'
NAME: cert-manager
LAST DEPLOYED: Thu Aug  1 14:22:38 2024
(..)

Test case Option 1 (helm)

$ helm install metrics-server . --namespace kube-system --values ci/tls-helm-values.yaml
NAME: metrics-server
LAST DEPLOYED: Thu Aug  1 14:26:24 2024
(..)

$ kubectl top pods
NAME                                         CPU(cores)   MEMORY(bytes)
coredns-76f75df574-j7bsd                     3m           14Mi
coredns-76f75df574-lfxqp                     3m           14Mi
etcd-kind-control-plane                      44m          44Mi
kindnet-94f7t                                1m           10Mi
kindnet-s7zgz                                1m           9Mi
kindnet-t5tv7                                1m           9Mi
kube-apiserver-kind-control-plane            137m         277Mi
kube-controller-manager-kind-control-plane   47m          52Mi
kube-proxy-2xpwm                             2m           14Mi
kube-proxy-72gdt                             1m           15Mi
kube-proxy-gdhf7                             1m           16Mi
kube-scheduler-kind-control-plane            6m           22Mi
metrics-server-5bc86f74cc-x5q7j              9m           19M

$ helm uninstall metrics-server --namespace kube-system
release "metrics-server" uninstalled

Test case Option 2 (cert-manager)

$ helm install metrics-server . --namespace kube-system --values ci/tls-certManager-values.yaml --wait
NAME: metrics-server
LAST DEPLOYED: Thu Aug  1 14:28:33 2024
(..)

$ kubectl top pods
NAME                                         CPU(cores)   MEMORY(bytes)
coredns-76f75df574-j7bsd                     4m           14Mi
coredns-76f75df574-lfxqp                     4m           14Mi
etcd-kind-control-plane                      53m          44Mi
kindnet-94f7t                                1m           10Mi
kindnet-s7zgz                                1m           9Mi
kindnet-t5tv7                                1m           9Mi
kube-apiserver-kind-control-plane            146m         294Mi
kube-controller-manager-kind-control-plane   60m          55Mi
kube-proxy-2xpwm                             1m           14Mi
kube-proxy-72gdt                             1m           15Mi
kube-proxy-gdhf7                             1m           16Mi
kube-scheduler-kind-control-plane            8m           22Mi
metrics-server-5bc86f74cc-k7ggm              9m           19Mi

$ helm uninstall metrics-server --namespace kube-system
release "metrics-server" uninstalled

Test case Option 3 (existing secret)

$ openssl req -x509 -newkey rsa:2048 -sha256 -days 365 \
  -nodes -keyout tls.key -out tls.crt \
  -subj "/CN=metrics-server" \
  -addext "subjectAltName=DNS:metrics-server,DNS:metrics-server.kube-system.svc"
.....+ (..)

$ kubectl -n kube-system create secret generic metrics-server-existing \
  --from-file=tls.key \
  --from-file=tls.crt
secret/metrics-server-existing created

$ cat <<EOF >> ci/tls-existingSecret-values.yaml
apiService:
  insecureSkipTLSVerify: false
  caBundle: |
$(cat tls.crt  | sed -e "s/^/    /g")
EOF

$ cat ci/tls-existingSecret-values.yaml
args:
  - --kubelet-insecure-tls

## Set via GH action (step "Prepare existing secret test scenario")
# apiService:
#   insecureSkipTLSVerify: false
#   caBundle: |

tls:
  type: existingSecret
  existingSecret:
    name: metrics-server-existing
apiService:
  insecureSkipTLSVerify: false
  caBundle: |
    -----BEGIN CERTIFICATE-----
    (..)
    -----END CERTIFICATE-----

$ helm install metrics-server . --namespace kube-system --values ci/tls-existingSecret-values.yaml --wait
NAME: metrics-server
LAST DEPLOYED: Thu Aug  1 14:44:15 2024
(..)

$ kubectl top pods
NAME                                         CPU(cores)   MEMORY(bytes)
coredns-76f75df574-j7bsd                     4m           22Mi
coredns-76f75df574-lfxqp                     4m           21Mi
etcd-kind-control-plane                      50m          53Mi
kindnet-94f7t                                2m           16Mi
kindnet-s7zgz                                1m           15Mi
kindnet-t5tv7                                1m           17Mi
kube-apiserver-kind-control-plane            146m         290Mi
kube-controller-manager-kind-control-plane   51m          53Mi
kube-proxy-2xpwm                             1m           17Mi
kube-proxy-72gdt                             2m           16Mi
kube-proxy-gdhf7                             3m           16Mi
kube-scheduler-kind-control-plane            7m           24Mi
metrics-server-868545df69-r96pf              11m          19Mi

$ helm uninstall metrics-server --namespace kube-system
release "metrics-server" uninstalled

When I have time at a later time, I can investigate if we can test these scenarios with a helm test.

mkilchhofer avatar Aug 01 '24 12:08 mkilchhofer

@mkilchhofer RE testing I was thinking of using Helm testing hooks to install a pod and a job to check the metrics for the pod are being served. This test would work for all CI tests (existing pattern and new patterns).

stevehipwell avatar Aug 01 '24 13:08 stevehipwell

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mkilchhofer, stevehipwell Once this PR has been reviewed and has the lgtm label, please assign logicalhan for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Aug 01 '24 13:08 k8s-ci-robot

@logicalhan could you please approve this?

stevehipwell avatar Aug 01 '24 13:08 stevehipwell

@logicalhan friendly ping

mkilchhofer avatar Aug 20 '24 18:08 mkilchhofer

@stevehipwell @logicalhan friendly ping

mkilchhofer avatar Sep 03 '24 07:09 mkilchhofer