vcluster icon indicating copy to clipboard operation
vcluster copied to clipboard

metrics-server is not fully installing in my vclusters (applying apiservice 'v1beta1.metrics.k8s.io' is failing)

Open lknite opened this issue 3 years ago • 2 comments
trafficstars

What happened?

Have recently been getting up to speed with vcluster, so when metrics-server didn't deploy I erased all my vclusters and redeployed them. Then again installed metrics-server to both the root host and vclusters using a known-to-work helm chart and values.yaml file. It works on the root host but the vclusters install everything except the apiservice.

What did you expect to happen?

metrics-server to work on both the vclusters and host cluster

How can we reproduce it (as minimally and precisely as possible)?

  1. deploy host cluster using 'centos-8-stream' nodes and kubeadm
  2. deploy vclusters
  3. deploy metrics-server

I'm using argocd along with an applicationset to deploy metrics-server to all my clusters, but you can just deploy a helm chart using the bitnami repo.

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: metrics-server
  namespace: argocd-global

spec:
  generators:
  - list:
      elements:
      - name: root
        server: https://k-root-c1:6443
      - name: vc-non
        server: https://vc-non.root.k.home.net
#      - name: vc-prod
#        server: https://vc-prod.root.k.home.net
#  - clusters:
#      selector:
#        matchLabels:
#          argocd.argoproj.io/secret-type: cluster

  template: 
    # This is a template Argo CD Application, but with support for parameter substitution.
    metadata:
      name: '{{name}}-metrics-server'
    spec:
      project: "default"
      source:
        repoURL: https://charts.bitnami.com/bitnami
        chart: metrics-server
        targetRevision: 6.0.8

        helm:
          releaseName: "metrics-server"
          parameters:
          - name: extraArgs[0]
            value: "--kubelet-insecure-tls=true"
          - name: extraArgs[1]
            value: "--kubelet-preferred-address-types=InternalIP"
          - name: apiService.create
            value: "true"

      destination:
        server: '{{server}}'
        namespace: metrics-server

      syncPolicy:
        syncOptions:
        - CreateNamespace=true

        automated:
          selfHeal: true
          prune: true

my vcluster deployment:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: vc-non
  namespace: argocd
#  annotations:
#    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
  project: default
  source:
    repoURL: https://charts.loft.sh
    targetRevision: v0.11.0
    #targetRevision: v0.11.0-beta.0
    #targetRevision: 0.10.2
    chart: vcluster

    helm:
      releaseName: vc-non
      parameters:
      #- name: vcluster.image
      #  value: "rancher/k3s:v1.24.3-k3s1"
      #must be manually specified if providing our own ingress, when using built-in ingress it auto-adds
      #- name: syncer.extraArgs[0]
      #  value: "--tls-san=vc-non.root.k.home.net"
      #- name: sync.services.enabled
      #  value: "false"
      #- name: sync.configmaps.enabled
      #  value: "false"
      #- name: sync.secrets.enabled
      #  value: "false"
      #- name: sync.endpoints.enabled
      #  value: "false"
      - name: sync.persistentvolumes.enabled
        value: "true"
      #- name: sync.ingresses.enabled
      #  value: "false"
      - name: sync.storageclasses.enabled
        value: "true"
      - name: sync.nodes.enabled
        value: "true"
      - name: sync.nodes.syncAllNodes
        value: "true"
      - name: ingress.enabled
        value: "true"
      - name: ingress.ingressClassName
        value: "nginx"
      - name: ingress.host
        value: vc-non.root.k.home.net
      - name: ingress.annotations.cert-manager\.io\/issuer
        value: "cluster-adcs-issuer"
      - name: ingress.annotations.cert-manager\.io\/issuer-kind
        value: "ClusterAdcsIssuer"
      - name: ingress.annotations.cert-manager\.io\/issuer-group
        value: "adcs.certmanager.csf.nokia.com"
      - name: init.manifests
        value: |
            apiVersion: storage.k8s.io/v1
            allowVolumeExpansion: true
            kind: StorageClass
            metadata:
              annotations:
                storageclass.kubernetes.io/is-default-class: "true"
              creationTimestamp: "2022-07-24T23:09:35Z"
              labels:
                app.kubernetes.io/instance: democratic-csi-iscsi
                app.kubernetes.io/managed-by: Helm
                app.kubernetes.io/name: democratic-csi
                argocd.argoproj.io/instance: root-democratic-csi-iscsi
                helm.sh/chart: democratic-csi-0.13.1
              name: freenas-iscsi-csi
              resourceVersion: "1834204"
              uid: e9122313-1902-409d-887c-1b5ba2539ae3
            parameters:
              csi.storage.k8s.io/controller-expand-secret-name: controller-expand-secret-freenas-iscsi-csi-democratic-csi-iscsi
              csi.storage.k8s.io/controller-expand-secret-namespace: democratic-csi-iscsi
              csi.storage.k8s.io/controller-publish-secret-name: controller-publish-secret-freenas-iscsi-csi-democratic-csi-iscs
              csi.storage.k8s.io/controller-publish-secret-namespace: democratic-csi-iscsi
              csi.storage.k8s.io/node-publish-secret-name: node-publish-secret-freenas-iscsi-csi-democratic-csi-iscsi
              csi.storage.k8s.io/node-publish-secret-namespace: democratic-csi-iscsi
              csi.storage.k8s.io/node-stage-secret-name: node-stage-secret-freenas-iscsi-csi-democratic-csi-iscsi
              csi.storage.k8s.io/node-stage-secret-namespace: democratic-csi-iscsi
              csi.storage.k8s.io/provisioner-secret-name: provisioner-secret-freenas-iscsi-csi-democratic-csi-iscsi
              csi.storage.k8s.io/provisioner-secret-namespace: democratic-csi-iscsi
              fsType: ext4
            provisioner: org.democratic-csi.iscsi
            reclaimPolicy: Delete
            volumeBindingMode: Immediate

  destination:
    server: https://kubernetes.default.svc
    namespace: vc-non

  syncPolicy:
    syncOptions:
    - CreateNamespace=true

    automated:
      selfHeal: true
      prune: true

Anything else we need to know?

When I look at the deployment in argocd I can see that everything is deployed as expected except for one thing, the v1beta1.metrics.k8s.io apiserver which has the health check message:

FailedDiscoveryCheck: failing or missing response from https://172.28.76.171:8443/apis/metrics.k8s.io/v1beta1: Get "https://172.28.76.171:8443/apis/metrics.k8s.io/v1beta1": proxy error from 127.0.0.1:6443 while dialing 172.28.76.171:8443, code 500: 500 Internal Server Error

Host cluster Kubernetes version

$ kubectl version --output yaml
clientVersion:
  buildDate: "2022-05-24T12:26:19Z"
  compiler: gc
  gitCommit: 3ddd0f45aa91e2f30c70734b175631bec5b5825a
  gitTreeState: clean
  gitVersion: v1.24.1
  goVersion: go1.18.2
  major: "1"
  minor: "24"
  platform: linux/amd64
kustomizeVersion: v4.5.4
serverVersion:
  buildDate: "2022-07-13T14:23:26Z"
  compiler: gc
  gitCommit: aef86a93758dc3cb2c658dd9657ab4ad4afc21cb
  gitTreeState: clean
  gitVersion: v1.24.3
  goVersion: go1.18.3
  major: "1"
  minor: "24"
  platform: linux/amd64

Host cluster Kubernetes distribution

The nodes are centos-8-stream, and I used kubeadm to manually install the cluster.

vcluster version

$ vcluster --version
vcluster version 0.10.2

but, i didn't use this ... installed via helm

Vcluster Kubernetes distribution(k3s(default)), k8s, k0s)

k3s(default)

OS and Arch

OS: centos-8-stream
Arch: amd64

lknite avatar Aug 07 '22 15:08 lknite

@lknite thanks for reporting this issue! Seems like there is an issue with port-forwarding internally, which we'll investigate and fix.

FabianKramm avatar Aug 08 '22 22:08 FabianKramm

Hi @lknite , I tried reproducing this issue by 2 methods:

  1. I tried installing the metrics server on a fresh vcluster using the upstream manifests privided here - https://github.com/kubernetes-sigs/metrics-server#installation
  2. I also tried installation using the official helm chart provided from the artifact hub - https://artifacthub.io/packages/helm/metrics-server/metrics-server

In both the cases I did not encounter any issues and the deployment server came up successfully.

At this point I would request you to try a few more things at your end.

  1. It seems like you're using a self managed cluster. Can you try this deployment in an alternate kind/docker-desktop/minikube cluster just to rule out its not some specific issue related to your cluster missing something.
  2. Another thing I would suggest is to verify installing the official helm chart or the official manifest - kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml so that we can rule out its not your ArgoCD Application that's missing something.

Thanks.

ishankhare07 avatar Aug 10 '22 09:08 ishankhare07

Closing. I'm not sure of the details, but after upgrading from 0.11.0 to 0.12.2 metrics server immediately installed and started working. Also, a namespace which wouldn't delete because of something metrics related also immediately deleted.

lknite avatar Oct 09 '22 01:10 lknite