kubeblocks icon indicating copy to clipboard operation
kubeblocks copied to clipboard

[Improvement]Addon status should sync with pod status

Open ahjing99 opened this issue 2 years ago • 6 comments

➜ ~ kbcli version Kubernetes: v1.27.1 KubeBlocks: 0.6.0-alpha.0 kbcli: 0.6.0-alpha.0 ➜ ~ kind version kind v0.19.0 go1.20.4 darwin/arm64

  1. brew install kind
  2. create cluster
➜  ~ kind create cluster
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.27.1) 🖼 ^@
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Thanks for using kind! 😊
  1. Install kubeblocks
➜  ~ kbcli kubeblocks install
KubeBlocks will be installed to namespace "kb-system"
Kubernetes version 1.27.1
kbcli version 0.6.0-alpha.0
Add and update repo kubeblocks                     OK
Install KubeBlocks 0.6.0-alpha.0                   OK
Wait for addons to be enabled
  Addon alertmanager-webhook-adaptor               OK
  Addon apecloud-mysql                             OK
  Addon grafana                                    OK
  Addon milvus                                     OK
  Addon mongodb                                    OK
  Addon postgresql                                 OK
  Addon prometheus                                 Failed
  Addon qdrant                                     OK
  Addon redis                                      OK
  Addon snapshot-controller                        OK
  Addon weaviate                                   OK
error: timeout waiting for auto-install addons to be enabled, run "kbcli addon list" to check addon status

➜  ~ k get pod -n kb-system
NAME                                                    READY   STATUS      RESTARTS   AGE
install-prometheus-addon-lvddt                          0/1     Completed   0          3m14s
install-prometheus-addon-nnq68                          0/1     Error       0          8m29s
kb-addon-alertmanager-webhook-adaptor-856488566-ktdkl   2/2     Running     0          8m26s
kb-addon-grafana-7554cf5785-fvgzt                       3/3     Running     0          8m24s
kb-addon-prometheus-alertmanager-0                      2/2     Running     0          3m11s
kb-addon-prometheus-server-0                            2/2     Running     0          3m11s
kb-addon-snapshot-controller-65fcc74964-9m8hh           1/1     Running     0          8m24s
kubeblocks-866c7bf687-sbjb4                             1/1     Running     0          9m56s

➜  ~ k logs install-prometheus-addon-nnq68 -n kb-system
Release "kb-addon-prometheus" does not exist. Installing it now.
Error: release kb-addon-prometheus failed, and has been uninstalled due to atomic being set: timed out waiting for the condition

➜  ~ kbcli addon list
NAME                           TYPE   STATUS     EXTRAS         AUTO-INSTALL   AUTO-INSTALLABLE-SELECTOR
aws-load-balancer-controller   Helm   Disabled                  false          {key=KubeGitVersion,op=Contains,values=[eks]}
chaos-mesh                     Helm   Disabled                  false
csi-hostpath-driver            Helm   Disabled                  false          {key=KubeGitVersion,op=DoesNotContain,values=[eks aliyun gke tke aks]}
csi-s3                         Helm   Disabled                  false
kubeblocks-csi-driver          Helm   Disabled   node           false          {key=KubeGitVersion,op=Contains,values=[eks]}
migration                      Helm   Disabled                  false
nyancat                        Helm   Disabled                  false
opensearch                     Helm   Disabled                  false
alertmanager-webhook-adaptor   Helm   Enabled                   true
apecloud-mysql                 Helm   Enabled                   true
grafana                        Helm   Enabled                   true
milvus                         Helm   Enabled                   true
mongodb                        Helm   Enabled                   true
postgresql                     Helm   Enabled                   true
qdrant                         Helm   Enabled                   true
redis                          Helm   Enabled                   true
snapshot-controller            Helm   Enabled                   true           {key=KubeGitVersion,op=DoesNotContain,values=[tke]}
weaviate                       Helm   Enabled                   true
prometheus                     Helm   Failed     alertmanager   true

➜  ~ k describe addon prometheus
Name:         prometheus
Namespace:
Labels:       app.kubernetes.io/instance=kubeblocks
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=kubeblocks
              app.kubernetes.io/version=0.6.0-alpha.0
              helm.sh/chart=kubeblocks-0.6.0-alpha.0
              kubeblocks.io/provider=community
Annotations:  meta.helm.sh/release-name: kubeblocks
              meta.helm.sh/release-namespace: kb-system
API Version:  extensions.kubeblocks.io/v1alpha1
Kind:         Addon
Metadata:
  Creation Timestamp:  2023-05-18T07:32:59Z
  Finalizers:
    addon.kubeblocks.io/finalizer
  Generation:  2
  Managed Fields:
    API Version:  extensions.kubeblocks.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:meta.helm.sh/release-name:
          f:meta.helm.sh/release-namespace:
        f:labels:
          .:
          f:app.kubernetes.io/instance:
          f:app.kubernetes.io/managed-by:
          f:app.kubernetes.io/name:
          f:app.kubernetes.io/version:
          f:helm.sh/chart:
          f:kubeblocks.io/provider:
      f:spec:
        .:
        f:defaultInstallValues:
        f:description:
        f:helm:
          .:
          f:chartLocationURL:
          f:installValues:
            .:
            f:configMapRefs:
          f:valuesMapping:
            .:
            f:extras:
              .:
              k:{"name":"alertmanager"}:
                .:
                f:jsonMap:
                  .:
                  f:tolerations:
                f:name:
                f:resources:
                  .:
                  f:cpu:
                    .:
                    f:limits:
                    f:requests:
                  f:memory:
                    .:
                    f:limits:
                    f:requests:
                  f:storage:
                f:valueMap:
                  .:
                  f:persistentVolumeEnabled:
                  f:replicaCount:
                  f:storageClass:
            f:jsonMap:
              .:
              f:tolerations:
            f:resources:
              .:
              f:cpu:
                .:
                f:limits:
                f:requests:
              f:memory:
                .:
                f:limits:
                f:requests:
              f:storage:
            f:valueMap:
              .:
              f:persistentVolumeEnabled:
              f:replicaCount:
              f:storageClass:
        f:installable:
          .:
          f:autoInstall:
        f:type:
    Manager:      kbcli
    Operation:    Update
    Time:         2023-05-18T07:32:59Z
    API Version:  extensions.kubeblocks.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .:
          v:"addon.kubeblocks.io/finalizer":
      f:spec:
        f:install:
          .:
          f:enabled:
          f:extras:
            .:
            k:{"name":"alertmanager"}:
              .:
              f:name:
              f:replicas:
              f:resources:
                .:
                f:requests:
                  .:
                  f:storage:
              f:tolerations:
          f:replicas:
          f:resources:
            .:
            f:limits:
              .:
              f:memory:
            f:requests:
              .:
              f:memory:
              f:storage:
          f:tolerations:
    Manager:      manager
    Operation:    Update
    Time:         2023-05-18T07:34:24Z
    API Version:  extensions.kubeblocks.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:conditions:
        f:observedGeneration:
        f:phase:
    Manager:         manager
    Operation:       Update
    Subresource:     status
    Time:            2023-05-18T07:39:34Z
  Resource Version:  2220
  UID:               18398099-6e08-4f0a-ac04-9c14f4d8d52f
Spec:
  Default Install Values:
    Extras:
      Name:      alertmanager
      Replicas:  1
      Resources:
        Requests:
          Storage:  4Gi
      Tolerations:  [{"effect":"NoSchedule","key":"kb-controller","operator":"Equal","value":"true"}]
    Replicas:       1
    Resources:
      Limits:
        Memory:  4Gi
      Requests:
        Memory:   512Mi
        Storage:  10Gi
    Tolerations:  [{"effect":"NoSchedule","key":"kb-controller","operator":"Equal","value":"true"}]
    Extras:
      Name:      alertmanager
      Replicas:  1
      Resources:
        Requests:
          Storage:  20Gi
      Tolerations:  [{"effect":"NoSchedule","key":"kb-controller","operator":"Equal","value":"true"}]
    Replicas:       1
    Resources:
      Limits:
        Memory:  4Gi
      Requests:
        Memory:   512Mi
        Storage:  20Gi
    Selectors:
      Key:       KubeGitVersion
      Operator:  Contains
      Values:
        aliyun
    Tolerations:  [{"effect":"NoSchedule","key":"kb-controller","operator":"Equal","value":"true"}]
    Extras:
      Name:      alertmanager
      Replicas:  1
      Resources:
        Requests:
          Storage:  10Gi
      Tolerations:  [{"effect":"NoSchedule","key":"kb-controller","operator":"Equal","value":"true"}]
    Replicas:       1
    Resources:
      Limits:
        Memory:  4Gi
      Requests:
        Memory:   512Mi
        Storage:  10Gi
    Selectors:
      Key:       KubeGitVersion
      Operator:  Contains
      Values:
        tke
    Tolerations:  [{"effect":"NoSchedule","key":"kb-controller","operator":"Equal","value":"true"}]
  Description:    Prometheus is a monitoring system and time series database.
  Helm:
    Chart Location URL:  https://jihulab.com/api/v4/projects/85949/packages/helm/stable/charts/prometheus-15.16.1.tgz
    Install Values:
      Config Map Refs:
        Key:   values-kubeblocks-override.yaml
        Name:  prometheus-chart-kubeblocks-values
    Values Mapping:
      Extras:
        Json Map:
          Tolerations:  alertmanager.tolerations
        Name:           alertmanager
        Resources:
          Cpu:
            Limits:    alertmanager.resources.limits.cpu
            Requests:  alertmanager.resources.requests.cpu
          Memory:
            Limits:    alertmanager.resources.limits.memory
            Requests:  alertmanager.resources.requests.memory
          Storage:     alertmanager.persistentVolume.size
        Value Map:
          Persistent Volume Enabled:  alertmanager.persistentVolume.enabled
          Replica Count:              alertmanager.replicaCount
          Storage Class:              alertmanager.persistentVolume.storageClass
      Json Map:
        Tolerations:  server.tolerations
      Resources:
        Cpu:
          Limits:    server.resources.limits.cpu
          Requests:  server.resources.requests.cpu
        Memory:
          Limits:    server.resources.limits.memory
          Requests:  server.resources.requests.memory
        Storage:     server.persistentVolume.size
      Value Map:
        Persistent Volume Enabled:  server.persistentVolume.enabled
        Replica Count:              server.replicaCount
        Storage Class:              server.persistentVolume.storageClass
  Install:
    Enabled:  true
    Extras:
      Name:      alertmanager
      Replicas:  1
      Resources:
        Requests:
          Storage:  4Gi
      Tolerations:  [{"effect":"NoSchedule","key":"kb-controller","operator":"Equal","value":"true"}]
    Replicas:       1
    Resources:
      Limits:
        Memory:  4Gi
      Requests:
        Memory:   512Mi
        Storage:  10Gi
    Tolerations:  [{"effect":"NoSchedule","key":"kb-controller","operator":"Equal","value":"true"}]
  Installable:
    Auto Install:  true
  Type:            Helm
Status:
  Conditions:
    Last Transition Time:  2023-05-18T07:39:34Z
    Message:               Release "kb-addon-prometheus" does not exist. Installing it now.
Error: release kb-addon-prometheus failed, and has been uninstalled due to atomic being set: timed out waiting for the condition

    Observed Generation:  2
    Reason:               InstallationFailedLogs
    Status:               False
    Type:                 InstallableChecked
  Observed Generation:    2
  Phase:                  Failed
Events:
  Type     Reason                  Age    From              Message
  ----     ------                  ----   ----              -------
  Normal   AddonAutoInstall        11m    addon-controller  Addon enabled auto-install
  Normal   EnablingAddon           11m    addon-controller  Progress to Enabling phase
  Warning  InstallationFailed      5m53s  addon-controller  Installation failed, do inspect error from jobs.batch kb-system/install-prometheus-addon
  Warning  InstallationFailedLogs  5m53s  addon-controller  Release "kb-addon-prometheus" does not exist. Installing it now.
Error: release kb-addon-prometheus failed, and has been uninstalled due to atomic being set: timed out waiting for the condition

ahjing99 avatar May 18 '23 07:05 ahjing99

Tried again, the addons were finally installed, but the status in kbcli addon list is still Failed, the status should sync with the actual pod status

➜  ~ kbcli addon list
NAME                           TYPE   STATUS     EXTRAS         AUTO-INSTALL   AUTO-INSTALLABLE-SELECTOR
aws-load-balancer-controller   Helm   Disabled                  false          {key=KubeGitVersion,op=Contains,values=[eks]}
chaos-mesh                     Helm   Disabled                  false
csi-hostpath-driver            Helm   Disabled                  false          {key=KubeGitVersion,op=DoesNotContain,values=[eks aliyun gke tke aks]}
csi-s3                         Helm   Disabled                  false
kubeblocks-csi-driver          Helm   Disabled   node           false          {key=KubeGitVersion,op=Contains,values=[eks]}
migration                      Helm   Disabled                  false
nyancat                        Helm   Disabled                  false
opensearch                     Helm   Disabled                  false
apecloud-mysql                 Helm   Enabled                   true
milvus                         Helm   Enabled                   true
mongodb                        Helm   Enabled                   true
postgresql                     Helm   Enabled                   true
qdrant                         Helm   Enabled                   true
redis                          Helm   Enabled                   true
snapshot-controller            Helm   Enabled                   true           {key=KubeGitVersion,op=DoesNotContain,values=[tke]}
alertmanager-webhook-adaptor   Helm   Failed                    true
grafana                        Helm   Failed                    true
prometheus                     Helm   Failed     alertmanager   true
weaviate                       Helm   Failed                    true

➜  ~ k get pod -n kb-system
NAME                                                    READY   STATUS      RESTARTS   AGE
install-alertmanager-webhook-adaptor-addon-gt8wf        0/1     Error       0          6m34s
install-alertmanager-webhook-adaptor-addon-ljr2h        0/1     Error       0          17m
install-alertmanager-webhook-adaptor-addon-mwgxm        0/1     Error       0          11m
install-alertmanager-webhook-adaptor-addon-vm28h        0/1     Completed   0          48s
install-grafana-addon-b8ffw                             0/1     Completed   0          49s
install-grafana-addon-gvknd                             0/1     Error       0          6m32s
install-grafana-addon-x2h8l                             0/1     Error       0          11m
install-grafana-addon-zxdxl                             0/1     Error       0          17m
install-prometheus-addon-gss6v                          0/1     Error       0          17m
install-prometheus-addon-jn5k6                          0/1     Error       0          11m
install-prometheus-addon-nwhvm                          0/1     Error       0          6m32s
install-prometheus-addon-zflbw                          0/1     Completed   0          49s
kb-addon-alertmanager-webhook-adaptor-856488566-hrdjh   2/2     Running     0          46s
kb-addon-grafana-7554cf5785-7dcfl                       3/3     Running     0          46s
kb-addon-prometheus-alertmanager-0                      2/2     Running     0          45s
kb-addon-prometheus-server-0                            2/2     Running     0          40s
kb-addon-snapshot-controller-65fcc74964-ck7s7           1/1     Running     0          17m
kubeblocks-866c7bf687-2q9lj                             1/1     Running     0          19m

After a while, all pods are running but addon status is still failed

➜  ~ k get pod -n kb-system
NAME                                                    READY   STATUS    RESTARTS   AGE
kb-addon-alertmanager-webhook-adaptor-856488566-hrdjh   2/2     Running   0          32m
kb-addon-grafana-7554cf5785-7dcfl                       3/3     Running   0          32m
kb-addon-prometheus-alertmanager-0                      2/2     Running   0          32m
kb-addon-prometheus-server-0                            2/2     Running   0          32m
kb-addon-snapshot-controller-65fcc74964-ck7s7           1/1     Running   0          48m
kubeblocks-866c7bf687-2q9lj                             1/1     Running   0          51m

➜  ~ kbcli addon list
NAME                           TYPE   STATUS     EXTRAS         AUTO-INSTALL   AUTO-INSTALLABLE-SELECTOR
aws-load-balancer-controller   Helm   Disabled                  false          {key=KubeGitVersion,op=Contains,values=[eks]}
chaos-mesh                     Helm   Disabled                  false
csi-hostpath-driver            Helm   Disabled                  false          {key=KubeGitVersion,op=DoesNotContain,values=[eks aliyun gke tke aks]}
csi-s3                         Helm   Disabled                  false
kubeblocks-csi-driver          Helm   Disabled   node           false          {key=KubeGitVersion,op=Contains,values=[eks]}
migration                      Helm   Disabled                  false
nyancat                        Helm   Disabled                  false
opensearch                     Helm   Disabled                  false
apecloud-mysql                 Helm   Enabled                   true
milvus                         Helm   Enabled                   true
mongodb                        Helm   Enabled                   true
postgresql                     Helm   Enabled                   true
qdrant                         Helm   Enabled                   true
redis                          Helm   Enabled                   true
snapshot-controller            Helm   Enabled                   true           {key=KubeGitVersion,op=DoesNotContain,values=[tke]}
alertmanager-webhook-adaptor   Helm   Failed                    true
grafana                        Helm   Failed                    true
prometheus                     Helm   Failed     alertmanager   true
weaviate                       Helm   Failed                    true

ahjing99 avatar May 18 '23 08:05 ahjing99

The requirement is to record warning event with reason why it failed, and this resulted that failed job pod log being the event message contents as following, so what exactly is the expectation here?

Warning  InstallationFailedLogs  5m53s  addon-controller  Release "kb-addon-prometheus" does not exist. Installing it now.
Error: release kb-addon-prometheus failed, and has been uninstalled due to atomic being set: timed out waiting for the condition

nashtsai avatar May 24 '23 08:05 nashtsai

I expect when pod status is running, addon status should also changed to enabled instead of failed

ahjing99 avatar May 24 '23 08:05 ahjing99

I expect when pod status is running, addon status should also changed to enabled instead of failed

This is not a bug, you could turn this to a improvement request.

nashtsai avatar Jun 01 '23 04:06 nashtsai

As demanding function is a "Helm Operator" function, and FluxCD Helm Operator is much better for handling it, an alternative is to bring-in FluxCD as Addon (this is similar to KubeVela's Helm component approach) and work on HelmRepository & HelmRelease CR.

//cc @fireworm2002

nashtsai avatar Jun 01 '23 04:06 nashtsai

When installing addon, the addon state should be the same as the pod state to determine whether the installation is successful. When addon runs, the pod state changes and the addon state diverges from the pod state. This is currently by design because addon lacks an operator to conciliate and only uses the helm for installation and upgrade.

ruijun2002 avatar Jun 01 '23 06:06 ruijun2002