kubeblocks
kubeblocks copied to clipboard
[BUG] mongodb restore datafile backup with large data failed
Describe the bug A clear and concise description of what the bug is.
kbcli version
Kubernetes: v1.30.6
KubeBlocks: 0.9.4-beta.4
kbcli: 0.9.4-beta.1
To Reproduce Steps to reproduce the behavior:
- create cluster
kubectl apply -f - <<EOF
apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
name: mongodb-cluster
namespace: default
spec:
terminationPolicy: WipeOut
componentSpecs:
- name: mongodb
componentDef: mongodb
serviceVersion: 6.0.16
replicas: 3
resources:
requests:
cpu: 8
memory: 16Gi
limits:
cpu: 16
memory: 32Gi
volumeClaimTemplates:
- name: data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Ti
EOF
- load large date 8000000000+
- datafile backup and load data at the same time
➜ ~ kbcli cluster backup mongodb-cluster --method datafile
➜ ~ kubectl get backup backup-default-mongodb-cluster-20250324094728
NAME POLICY METHOD REPO STATUS TOTAL-SIZE DURATION CREATION-TIME COMPLETION-TIME EXPIRATION-TIME
backup-default-mongodb-cluster-20250324094728 mongodb-cluster-mongodb-backup-policy datafile backuprepo-kbcli-test Completed 168923946760 2h48m11s 2025-03-24T01:47:29Z 2025-03-24T04:35:41Z
- restore
kbcli cluster restore mongodb-restore --backup backup-default-mongodb-cluster-20250324094728
- See error
kubectl get ops -l app.kubernetes.io/instance=mongodb-restore
NAME TYPE CLUSTER STATUS PROGRESS AGE
mongodb-restore Restore mongodb-restore Running -/- 4h40m
kubectl get pod -l app.kubernetes.io/instance=mongodb-restore
NAME READY STATUS RESTARTS AGE
mongodb-restore-mongodb-0 2/2 Running 0 136m
mongodb-restore-mongodb-1 2/2 Running 0 136m
mongodb-restore-mongodb-2 2/2 Running 0 136m
kubectl get cluster mongodb-restore
NAME CLUSTER-DEFINITION VERSION TERMINATION-POLICY STATUS AGE
mongodb-restore WipeOut Creating 4h40m
kbcli cluster list-instances mongodb-restore
NAME NAMESPACE CLUSTER COMPONENT STATUS ROLE ACCESSMODE AZ CPU(REQUEST/LIMIT) MEMORY(REQUEST/LIMIT) STORAGE NODE CREATED-TIME
mongodb-restore-mongodb-0 default mongodb-restore mongodb Running <none> <none> 0 8 / 16 16Gi / 32Gi data:2Ti aks-backuplarge-28239716-vmss00000g/10.224.0.11 Mar 24,2025 19:44 UTC+0800
mongodb-restore-mongodb-1 default mongodb-restore mongodb Running <none> <none> 0 8 / 16 16Gi / 32Gi data:2Ti aks-backuplarge-28239716-vmss00000h/10.224.0.12 Mar 24,2025 19:45 UTC+0800
mongodb-restore-mongodb-2 default mongodb-restore mongodb Running <none> <none> 0 8 / 16 16Gi / 32Gi data:2Ti aks-backuplarge-28239716-vmss00000l/10.224.0.7 Mar 24,2025 19:45 UTC+0800
logs pod
kubectl logs mongodb-restore-mongodb-0
Defaulted container "mongodb" out of: mongodb, lorry, init-syncer (init)
2025-03-24T11:45:07Z DEBUG Starting syncer {"version": "83d705a6e46885297cbf93506e3519032efe13c9"}
2025-03-24T11:45:07Z INFO Initialize DB manager
2025-03-24T11:45:07Z DEBUG KB_WORKLOAD_TYPE ENV not set
2025-03-24T11:45:07Z INFO HTTPServer Starting HTTP Server
2025-03-24T11:45:07Z INFO HTTPServer API route path {"method": "GET", "path": ["/v1.0/datasync", "/v1.0/getrole"]}
2025-03-24T11:45:07Z INFO HTTPServer API route path {"method": "POST", "path": ["/v1.0/rebuild", "/v1.0/start", "/v1.0/stop", "/v1.0/switchover"]}
2025-03-24T11:45:07Z INFO HA HA starting
2025-03-24T11:45:07Z INFO pinger Waiting for dns resolution to be ready
2025-03-24T11:45:07Z INFO HTTPServer http server {"listen address": "0.0.0.0", "port": 3601}
2025-03-24T11:45:07Z INFO pinger dns resolution is ready {"dns": "mongodb-restore-mongodb-0.mongodb-restore-mongodb-headless.default.svc"}
2025-03-24T11:45:07Z INFO DCS-K8S pod selector: app.kubernetes.io/instance=mongodb-restore,app.kubernetes.io/managed-by=kubeblocks,apps.kubeblocks.io/component-name=mongodb
2025-03-24T11:45:07Z INFO DCS-K8S podlist: 1
2025-03-24T11:45:07Z INFO DCS-K8S Leader configmap is not found {"configmap": "mongodb-restore-mongodb-leader"}
2025-03-24T11:45:07Z DEBUG HA cluster info {"cluster": {"ClusterCompName":"mongodb-restore-mongodb","Namespace":"default","Replicas":3,"HaConfig":{"DeleteMembers":null},"Leader":null,"Members":[{"Index":"","Name":"mongodb-restore-mongodb-0","Role":"","PodIP":"10.244.0.54","DBPort":"27017","SyncerPort":"3601","UID":"b4c19851-3de8-48b4-8aff-64594ddc2f27","ComponentName":"mongodb","UseIP":false}],"Switchover":null,"Extra":null,"Resource":{"metadata":{"name":"mongodb-restore","namespace":"default","uid":"49f603d0-b2ed-4364-90a3-c559dce279f1","resourceVersion":"62571254","generation":1,"creationTimestamp":"2025-03-24T09:21:19Z","annotations":{"kubeblocks.io/ops-request":"[{\"name\":\"mongodb-restore\",\"type\":\"Restore\"}]","kubeblocks.io/restore-from-backup":"{\"mongodb\":{\"doReadyRestoreAfterClusterRunning\":\"false\",\"encryptedSystemAccounts\":\"{\\\"root\\\":\\\"ndM92DGwOhMMmiUIY8f1+KyE8zSTUF3mqJvDjtMUQwB6jxpIfxcc0902MGs=\\\"}\",\"name\":\"backup-default-mongodb-cluster-20250324094728\",\"namespace\":\"default\",\"volumeRestorePolicy\":\"Parallel\"}}"},"finalizers":["cluster.kubeblocks.io/finalizer"],"managedFields":[{"manager":"kubeblocks","operation":"Update","apiVersion":"apps.kubeblocks.io/v1alpha1","time":"2025-03-24T09:21:20Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:kubeblocks.io/ops-request":{},"f:kubeblocks.io/restore-from-backup":{}},"f:finalizers":{".":{},"v:\"cluster.kubeblocks.io/finalizer\"":{}}},"f:spec":{".":{},"f:componentSpecs":{},"f:resources":{".":{},"f:cpu":{},"f:memory":{}},"f:storage":{".":{},"f:size":{}},"f:terminationPolicy":{}}}},{"manager":"kubeblocks","operation":"Update","apiVersion":"apps.kubeblocks.io/v1alpha1","time":"2025-03-24T11:44:58Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{".":{},"f:components":{".":{},"f:mongodb":{".":{},"f:phase":{},"f:podsReady":{}}},"f:conditions":{},"f:observedGeneration":{},"f:phase":{}}},"subresource":"status"}]},"spec":{"terminationPolicy":"WipeOut","componentSpecs":[{"name":"mongodb","componentDef":"mongodb","replicas":3,"resources":{"limits":{"cpu":"16","memory":"32Gi"},"requests":{"cpu":"8","memory":"16Gi"}},"volumeClaimTemplates":[{"name":"data","spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"2Ti"}}}}]}],"resources":{"cpu":"0","memory":"0"},"storage":{"size":"0"},"monitor":{}},"status":{"observedGeneration":1,"phase":"Creating","components":{"mongodb":{"phase":"Creating","podsReady":false}},"conditions":[{"type":"ProvisioningStarted","status":"True","observedGeneration":1,"lastTransitionTime":"2025-03-24T09:21:19Z","reason":"PreCheckSucceed","message":"The operator has started the provisioning of Cluster: mongodb-restore"},{"type":"ApplyResources","status":"True","observedGeneration":1,"lastTransitionTime":"2025-03-24T09:21:19Z","reason":"ApplyResourcesSucceed","message":"Successfully applied for resources"},{"type":"ReplicasReady","status":"False","lastTransitionTime":"2025-03-24T09:21:20Z","reason":"ReplicasNotReady","message":"pods are not ready in Components: [mongodb], refer to related component message in Cluster.status.components"},{"type":"Ready","status":"False","lastTransitionTime":"2025-03-24T09:21:20Z","reason":"ComponentsNotReady","message":"pods are unavailable in Components: [mongodb], refer to related component message in Cluster.status.components"}]}}}}
2025-03-24T11:45:07Z INFO Hypervisor Starting Hypervisor
2025-03-24T11:45:07Z INFO Hypervisor Start DB Service {"command": "/scripts/replicaset-setup.sh /scripts/replicaset-setup.sh"}
2025-03-24T11:45:07Z INFO HA check if DB Service is running
2025-03-24T11:45:07Z INFO HA DB Service is running
2025-03-24T11:45:07Z INFO Hypervisor Starting watcher on dbService
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
2025-03-24T11:45:08Z INFO MongoDB Get replSet status failed {"error": "replSetGetStatus: server selection error: context deadline exceeded, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: mongodb-restore-mongodb-0.mongodb-restore-mongodb-headless.default.svc:27017, Type: Unknown, Last error: dial tcp 10.244.0.54:27017: connect: connection refused }, ] }", "errorVerbose": "server selection error: context deadline exceeded, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: mongodb-restore-mongodb-0.mongodb-restore-mongodb-headless.default.svc:27017, Type: Unknown, Last error: dial tcp 10.244.0.54:27017: connect: connection refused }, ] }\nreplSetGetStatus\ngithub.com/apecloud/syncer/engines/mongodb.GetReplSetStatus\n\t/src/engines/mongodb/replset.go:35\ngithub.com/apecloud/syncer/engines/mongodb.(*Manager).IsClusterInitialized\n\t/src/engines/mongodb/manager.go:154\ngithub.com/apecloud/syncer/highavailability.(*Ha).Start\n\t/src/highavailability/ha.go:261\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"}
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
2025-03-24T11:45:38Z INFO MongoDB Get replSet status with local unauth client failed {"error": "server selection error: server selection timeout, current topology: { Type: Single, Servers: [{ Addr: localhost:27017, Type: Unknown, Last error: dial tcp [::1]:27017: connect: connection refused }, ] }"}
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
2025-03-24T11:46:08Z INFO MongoDB Get replSet status with local auth client failed {"error": "replSetGetStatus: server selection error: server selection timeout, current topology: { Type: Single, Servers: [{ Addr: localhost:27017, Type: Unknown, Last error: dial tcp [::1]:27017: connect: connection refused }, ] }"}
2025-03-24T11:46:08Z INFO HA Waiting for the database cluster to be initialized.
2025-03-24T11:46:08Z INFO HA Initialize the database cluster Failed. {"error": "replSetGetStatus: server selection error: server selection timeout, current topology: { Type: Single, Servers: [{ Addr: localhost:27017, Type: Unknown, Last error: dial tcp [::1]:27017: connect: connection refused }, ] }"}
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
2025-03-24T11:46:14Z INFO MongoDB Get replSet status failed {"error": "replSetGetStatus: server selection error: context deadline exceeded, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: mongodb-restore-mongodb-0.mongodb-restore-mongodb-headless.default.svc:27017, Type: Unknown, Last error: dial tcp 10.244.0.54:27017: connect: connection refused }, ] }", "errorVerbose": "server selection error: context deadline exceeded, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: mongodb-restore-mongodb-0.mongodb-restore-mongodb-headless.default.svc:27017, Type: Unknown, Last error: dial tcp 10.244.0.54:27017: connect: connection refused }, ] }\nreplSetGetStatus\ngithub.com/apecloud/syncer/engines/mongodb.GetReplSetStatus\n\t/src/engines/mongodb/replset.go:35\ngithub.com/apecloud/syncer/engines/mongodb.(*Manager).IsClusterInitialized\n\t/src/engines/mongodb/manager.go:154\ngithub.com/apecloud/syncer/highavailability.(*Ha).Start\n\t/src/highavailability/ha.go:277\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"}
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
logs lorry
➜ ~ kubectl logs mongodb-restore-mongodb-0 lorry
2025-03-24T11:45:07Z INFO Initialize DB manager
2025-03-24T11:45:07Z INFO KB_WORKLOAD_TYPE ENV not set
2025-03-24T11:45:07Z INFO Volume-Protection succeed to init volume protection {"pod": "mongodb-restore-mongodb-0", "spec": {"highWatermark":"0","volumes":[]}}
2025-03-24T11:45:07Z INFO HTTPServer Starting HTTP Server
2025-03-24T11:45:07Z INFO HTTPServer API route path {"method": "POST", "path": ["/v1.0/datadump", "/v1.0/joinmember", "/v1.0/rebuild", "/v1.0/lockinstance", "/v1.0/preterminate", "/v1.0/checkrunning", "/v1.0/switchover", "/v1.0/leavemember", "/v1.0/revokeuserrole", "/v1.0/dataload", "/v1.0/unlockinstance", "/v1.0/deleteuser", "/v1.0/getlag", "/v1.0/createuser", "/v1.0/volumeprotection", "/v1.0/exec", "/v1.0/grantuserrole", "/v1.0/postprovision"]}
2025-03-24T11:45:07Z INFO HTTPServer API route path {"method": "GET", "path": ["/v1.0/getrole", "/v1.0/healthycheck", "/v1.0/query", "/v1.0/checkrole", "/v1.0/listusers", "/v1.0/describeuser", "/v1.0/listsystemaccounts"]}
2025-03-24T11:45:07Z INFO cronjobs env is not set {"env": "KB_CRON_JOBS"}
2025-03-24T11:45:17Z INFO MongoDB DB is not ready {"error": "server selection error: context deadline exceeded, current topology: { Type: Single, Servers: [{ Addr: localhost:27017, Type: Unknown, Last error: dial tcp [::1]:27017: connect: connection refused }, ] }"}
2025-03-24T11:45:18Z INFO MongoDB DB is not ready {"error": "server selection error: context deadline exceeded, current topology: { Type: Single, Servers: [{ Addr: localhost:27017, Type: Unknown, Last error: dial tcp [::1]:27017: connect: connection refused }, ] }"}
2025-03-24T11:45:20Z INFO MongoDB DB is not ready {"error": "server selection error: context deadline exceeded, current topology: { Type: Single, Servers: [{ Addr: localhost:27017, Type: Unknown, Last error: dial tcp [::1]:27017: connect: connection refused }, ] }"}
2025-03-24T11:45:22Z INFO MongoDB DB is not ready {"error": "server selection error: context deadline exceeded, current topology: { Type: Single, Servers: [{ Addr: localhost:27017, Type: Unknown, Last error: dial tcp [::1]:27017: connect: connection refused }, ] }"}
2025-03-24T11:45:24Z INFO MongoDB DB is not ready {"error": "server selection error: context deadline exceeded, current topology: { Type: Single, Servers: [{ Addr: localhost:27017, Type: Unknown, Last error: dial tcp [::1]:27017: connect: connection refused }, ] }"}
2025-03-24T11:45:26Z INFO MongoDB DB is not ready {"error": "server selection error: context deadline exceeded, current topology: { Type: Single, Servers: [{ Addr: localhost:27017, Type: Unknown, Last error: dial tcp [::1]:27017: connect: connection refused }, ] }"}
2025-03-24T11:45:28Z INFO MongoDB DB is not ready {"error": "server selection error: context deadline exceeded, current topology: { Type: Single, Servers: [{ Addr: localhost:27017, Type: Unknown, Last error: dial tcp [::1]:27017: connect: connection refused }, ] }"}
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
- OS: [e.g. iOS]
- Browser [e.g. chrome, safari]
- Version [e.g. 22]
Additional context Add any other context about the problem here.