[Improvement]Addon status should sync with pod status
➜ ~ kbcli version Kubernetes: v1.27.1 KubeBlocks: 0.6.0-alpha.0 kbcli: 0.6.0-alpha.0 ➜ ~ kind version kind v0.19.0 go1.20.4 darwin/arm64
- brew install kind
- create cluster
➜ ~ kind create cluster
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.27.1) 🖼 ^@
✓ Preparing nodes 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:
kubectl cluster-info --context kind-kind
Thanks for using kind! 😊
- Install kubeblocks
➜ ~ kbcli kubeblocks install
KubeBlocks will be installed to namespace "kb-system"
Kubernetes version 1.27.1
kbcli version 0.6.0-alpha.0
Add and update repo kubeblocks OK
Install KubeBlocks 0.6.0-alpha.0 OK
Wait for addons to be enabled
Addon alertmanager-webhook-adaptor OK
Addon apecloud-mysql OK
Addon grafana OK
Addon milvus OK
Addon mongodb OK
Addon postgresql OK
Addon prometheus Failed
Addon qdrant OK
Addon redis OK
Addon snapshot-controller OK
Addon weaviate OK
error: timeout waiting for auto-install addons to be enabled, run "kbcli addon list" to check addon status
➜ ~ k get pod -n kb-system
NAME READY STATUS RESTARTS AGE
install-prometheus-addon-lvddt 0/1 Completed 0 3m14s
install-prometheus-addon-nnq68 0/1 Error 0 8m29s
kb-addon-alertmanager-webhook-adaptor-856488566-ktdkl 2/2 Running 0 8m26s
kb-addon-grafana-7554cf5785-fvgzt 3/3 Running 0 8m24s
kb-addon-prometheus-alertmanager-0 2/2 Running 0 3m11s
kb-addon-prometheus-server-0 2/2 Running 0 3m11s
kb-addon-snapshot-controller-65fcc74964-9m8hh 1/1 Running 0 8m24s
kubeblocks-866c7bf687-sbjb4 1/1 Running 0 9m56s
➜ ~ k logs install-prometheus-addon-nnq68 -n kb-system
Release "kb-addon-prometheus" does not exist. Installing it now.
Error: release kb-addon-prometheus failed, and has been uninstalled due to atomic being set: timed out waiting for the condition
➜ ~ kbcli addon list
NAME TYPE STATUS EXTRAS AUTO-INSTALL AUTO-INSTALLABLE-SELECTOR
aws-load-balancer-controller Helm Disabled false {key=KubeGitVersion,op=Contains,values=[eks]}
chaos-mesh Helm Disabled false
csi-hostpath-driver Helm Disabled false {key=KubeGitVersion,op=DoesNotContain,values=[eks aliyun gke tke aks]}
csi-s3 Helm Disabled false
kubeblocks-csi-driver Helm Disabled node false {key=KubeGitVersion,op=Contains,values=[eks]}
migration Helm Disabled false
nyancat Helm Disabled false
opensearch Helm Disabled false
alertmanager-webhook-adaptor Helm Enabled true
apecloud-mysql Helm Enabled true
grafana Helm Enabled true
milvus Helm Enabled true
mongodb Helm Enabled true
postgresql Helm Enabled true
qdrant Helm Enabled true
redis Helm Enabled true
snapshot-controller Helm Enabled true {key=KubeGitVersion,op=DoesNotContain,values=[tke]}
weaviate Helm Enabled true
prometheus Helm Failed alertmanager true
➜ ~ k describe addon prometheus
Name: prometheus
Namespace:
Labels: app.kubernetes.io/instance=kubeblocks
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=kubeblocks
app.kubernetes.io/version=0.6.0-alpha.0
helm.sh/chart=kubeblocks-0.6.0-alpha.0
kubeblocks.io/provider=community
Annotations: meta.helm.sh/release-name: kubeblocks
meta.helm.sh/release-namespace: kb-system
API Version: extensions.kubeblocks.io/v1alpha1
Kind: Addon
Metadata:
Creation Timestamp: 2023-05-18T07:32:59Z
Finalizers:
addon.kubeblocks.io/finalizer
Generation: 2
Managed Fields:
API Version: extensions.kubeblocks.io/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:meta.helm.sh/release-name:
f:meta.helm.sh/release-namespace:
f:labels:
.:
f:app.kubernetes.io/instance:
f:app.kubernetes.io/managed-by:
f:app.kubernetes.io/name:
f:app.kubernetes.io/version:
f:helm.sh/chart:
f:kubeblocks.io/provider:
f:spec:
.:
f:defaultInstallValues:
f:description:
f:helm:
.:
f:chartLocationURL:
f:installValues:
.:
f:configMapRefs:
f:valuesMapping:
.:
f:extras:
.:
k:{"name":"alertmanager"}:
.:
f:jsonMap:
.:
f:tolerations:
f:name:
f:resources:
.:
f:cpu:
.:
f:limits:
f:requests:
f:memory:
.:
f:limits:
f:requests:
f:storage:
f:valueMap:
.:
f:persistentVolumeEnabled:
f:replicaCount:
f:storageClass:
f:jsonMap:
.:
f:tolerations:
f:resources:
.:
f:cpu:
.:
f:limits:
f:requests:
f:memory:
.:
f:limits:
f:requests:
f:storage:
f:valueMap:
.:
f:persistentVolumeEnabled:
f:replicaCount:
f:storageClass:
f:installable:
.:
f:autoInstall:
f:type:
Manager: kbcli
Operation: Update
Time: 2023-05-18T07:32:59Z
API Version: extensions.kubeblocks.io/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:finalizers:
.:
v:"addon.kubeblocks.io/finalizer":
f:spec:
f:install:
.:
f:enabled:
f:extras:
.:
k:{"name":"alertmanager"}:
.:
f:name:
f:replicas:
f:resources:
.:
f:requests:
.:
f:storage:
f:tolerations:
f:replicas:
f:resources:
.:
f:limits:
.:
f:memory:
f:requests:
.:
f:memory:
f:storage:
f:tolerations:
Manager: manager
Operation: Update
Time: 2023-05-18T07:34:24Z
API Version: extensions.kubeblocks.io/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:status:
.:
f:conditions:
f:observedGeneration:
f:phase:
Manager: manager
Operation: Update
Subresource: status
Time: 2023-05-18T07:39:34Z
Resource Version: 2220
UID: 18398099-6e08-4f0a-ac04-9c14f4d8d52f
Spec:
Default Install Values:
Extras:
Name: alertmanager
Replicas: 1
Resources:
Requests:
Storage: 4Gi
Tolerations: [{"effect":"NoSchedule","key":"kb-controller","operator":"Equal","value":"true"}]
Replicas: 1
Resources:
Limits:
Memory: 4Gi
Requests:
Memory: 512Mi
Storage: 10Gi
Tolerations: [{"effect":"NoSchedule","key":"kb-controller","operator":"Equal","value":"true"}]
Extras:
Name: alertmanager
Replicas: 1
Resources:
Requests:
Storage: 20Gi
Tolerations: [{"effect":"NoSchedule","key":"kb-controller","operator":"Equal","value":"true"}]
Replicas: 1
Resources:
Limits:
Memory: 4Gi
Requests:
Memory: 512Mi
Storage: 20Gi
Selectors:
Key: KubeGitVersion
Operator: Contains
Values:
aliyun
Tolerations: [{"effect":"NoSchedule","key":"kb-controller","operator":"Equal","value":"true"}]
Extras:
Name: alertmanager
Replicas: 1
Resources:
Requests:
Storage: 10Gi
Tolerations: [{"effect":"NoSchedule","key":"kb-controller","operator":"Equal","value":"true"}]
Replicas: 1
Resources:
Limits:
Memory: 4Gi
Requests:
Memory: 512Mi
Storage: 10Gi
Selectors:
Key: KubeGitVersion
Operator: Contains
Values:
tke
Tolerations: [{"effect":"NoSchedule","key":"kb-controller","operator":"Equal","value":"true"}]
Description: Prometheus is a monitoring system and time series database.
Helm:
Chart Location URL: https://jihulab.com/api/v4/projects/85949/packages/helm/stable/charts/prometheus-15.16.1.tgz
Install Values:
Config Map Refs:
Key: values-kubeblocks-override.yaml
Name: prometheus-chart-kubeblocks-values
Values Mapping:
Extras:
Json Map:
Tolerations: alertmanager.tolerations
Name: alertmanager
Resources:
Cpu:
Limits: alertmanager.resources.limits.cpu
Requests: alertmanager.resources.requests.cpu
Memory:
Limits: alertmanager.resources.limits.memory
Requests: alertmanager.resources.requests.memory
Storage: alertmanager.persistentVolume.size
Value Map:
Persistent Volume Enabled: alertmanager.persistentVolume.enabled
Replica Count: alertmanager.replicaCount
Storage Class: alertmanager.persistentVolume.storageClass
Json Map:
Tolerations: server.tolerations
Resources:
Cpu:
Limits: server.resources.limits.cpu
Requests: server.resources.requests.cpu
Memory:
Limits: server.resources.limits.memory
Requests: server.resources.requests.memory
Storage: server.persistentVolume.size
Value Map:
Persistent Volume Enabled: server.persistentVolume.enabled
Replica Count: server.replicaCount
Storage Class: server.persistentVolume.storageClass
Install:
Enabled: true
Extras:
Name: alertmanager
Replicas: 1
Resources:
Requests:
Storage: 4Gi
Tolerations: [{"effect":"NoSchedule","key":"kb-controller","operator":"Equal","value":"true"}]
Replicas: 1
Resources:
Limits:
Memory: 4Gi
Requests:
Memory: 512Mi
Storage: 10Gi
Tolerations: [{"effect":"NoSchedule","key":"kb-controller","operator":"Equal","value":"true"}]
Installable:
Auto Install: true
Type: Helm
Status:
Conditions:
Last Transition Time: 2023-05-18T07:39:34Z
Message: Release "kb-addon-prometheus" does not exist. Installing it now.
Error: release kb-addon-prometheus failed, and has been uninstalled due to atomic being set: timed out waiting for the condition
Observed Generation: 2
Reason: InstallationFailedLogs
Status: False
Type: InstallableChecked
Observed Generation: 2
Phase: Failed
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal AddonAutoInstall 11m addon-controller Addon enabled auto-install
Normal EnablingAddon 11m addon-controller Progress to Enabling phase
Warning InstallationFailed 5m53s addon-controller Installation failed, do inspect error from jobs.batch kb-system/install-prometheus-addon
Warning InstallationFailedLogs 5m53s addon-controller Release "kb-addon-prometheus" does not exist. Installing it now.
Error: release kb-addon-prometheus failed, and has been uninstalled due to atomic being set: timed out waiting for the condition
Tried again, the addons were finally installed, but the status in kbcli addon list is still Failed, the status should sync with the actual pod status
➜ ~ kbcli addon list
NAME TYPE STATUS EXTRAS AUTO-INSTALL AUTO-INSTALLABLE-SELECTOR
aws-load-balancer-controller Helm Disabled false {key=KubeGitVersion,op=Contains,values=[eks]}
chaos-mesh Helm Disabled false
csi-hostpath-driver Helm Disabled false {key=KubeGitVersion,op=DoesNotContain,values=[eks aliyun gke tke aks]}
csi-s3 Helm Disabled false
kubeblocks-csi-driver Helm Disabled node false {key=KubeGitVersion,op=Contains,values=[eks]}
migration Helm Disabled false
nyancat Helm Disabled false
opensearch Helm Disabled false
apecloud-mysql Helm Enabled true
milvus Helm Enabled true
mongodb Helm Enabled true
postgresql Helm Enabled true
qdrant Helm Enabled true
redis Helm Enabled true
snapshot-controller Helm Enabled true {key=KubeGitVersion,op=DoesNotContain,values=[tke]}
alertmanager-webhook-adaptor Helm Failed true
grafana Helm Failed true
prometheus Helm Failed alertmanager true
weaviate Helm Failed true
➜ ~ k get pod -n kb-system
NAME READY STATUS RESTARTS AGE
install-alertmanager-webhook-adaptor-addon-gt8wf 0/1 Error 0 6m34s
install-alertmanager-webhook-adaptor-addon-ljr2h 0/1 Error 0 17m
install-alertmanager-webhook-adaptor-addon-mwgxm 0/1 Error 0 11m
install-alertmanager-webhook-adaptor-addon-vm28h 0/1 Completed 0 48s
install-grafana-addon-b8ffw 0/1 Completed 0 49s
install-grafana-addon-gvknd 0/1 Error 0 6m32s
install-grafana-addon-x2h8l 0/1 Error 0 11m
install-grafana-addon-zxdxl 0/1 Error 0 17m
install-prometheus-addon-gss6v 0/1 Error 0 17m
install-prometheus-addon-jn5k6 0/1 Error 0 11m
install-prometheus-addon-nwhvm 0/1 Error 0 6m32s
install-prometheus-addon-zflbw 0/1 Completed 0 49s
kb-addon-alertmanager-webhook-adaptor-856488566-hrdjh 2/2 Running 0 46s
kb-addon-grafana-7554cf5785-7dcfl 3/3 Running 0 46s
kb-addon-prometheus-alertmanager-0 2/2 Running 0 45s
kb-addon-prometheus-server-0 2/2 Running 0 40s
kb-addon-snapshot-controller-65fcc74964-ck7s7 1/1 Running 0 17m
kubeblocks-866c7bf687-2q9lj 1/1 Running 0 19m
After a while, all pods are running but addon status is still failed
➜ ~ k get pod -n kb-system
NAME READY STATUS RESTARTS AGE
kb-addon-alertmanager-webhook-adaptor-856488566-hrdjh 2/2 Running 0 32m
kb-addon-grafana-7554cf5785-7dcfl 3/3 Running 0 32m
kb-addon-prometheus-alertmanager-0 2/2 Running 0 32m
kb-addon-prometheus-server-0 2/2 Running 0 32m
kb-addon-snapshot-controller-65fcc74964-ck7s7 1/1 Running 0 48m
kubeblocks-866c7bf687-2q9lj 1/1 Running 0 51m
➜ ~ kbcli addon list
NAME TYPE STATUS EXTRAS AUTO-INSTALL AUTO-INSTALLABLE-SELECTOR
aws-load-balancer-controller Helm Disabled false {key=KubeGitVersion,op=Contains,values=[eks]}
chaos-mesh Helm Disabled false
csi-hostpath-driver Helm Disabled false {key=KubeGitVersion,op=DoesNotContain,values=[eks aliyun gke tke aks]}
csi-s3 Helm Disabled false
kubeblocks-csi-driver Helm Disabled node false {key=KubeGitVersion,op=Contains,values=[eks]}
migration Helm Disabled false
nyancat Helm Disabled false
opensearch Helm Disabled false
apecloud-mysql Helm Enabled true
milvus Helm Enabled true
mongodb Helm Enabled true
postgresql Helm Enabled true
qdrant Helm Enabled true
redis Helm Enabled true
snapshot-controller Helm Enabled true {key=KubeGitVersion,op=DoesNotContain,values=[tke]}
alertmanager-webhook-adaptor Helm Failed true
grafana Helm Failed true
prometheus Helm Failed alertmanager true
weaviate Helm Failed true
The requirement is to record warning event with reason why it failed, and this resulted that failed job pod log being the event message contents as following, so what exactly is the expectation here?
Warning InstallationFailedLogs 5m53s addon-controller Release "kb-addon-prometheus" does not exist. Installing it now.
Error: release kb-addon-prometheus failed, and has been uninstalled due to atomic being set: timed out waiting for the condition
I expect when pod status is running, addon status should also changed to enabled instead of failed
I expect when pod status is running, addon status should also changed to enabled instead of failed
This is not a bug, you could turn this to a improvement request.
As demanding function is a "Helm Operator" function, and FluxCD Helm Operator is much better for handling it, an alternative is to bring-in FluxCD as Addon (this is similar to KubeVela's Helm component approach) and work on HelmRepository & HelmRelease CR.
//cc @fireworm2002
When installing addon, the addon state should be the same as the pod state to determine whether the installation is successful. When addon runs, the pod state changes and the addon state diverges from the pod state. This is currently by design because addon lacks an operator to conciliate and only uses the helm for installation and upgrade.