kubeblocks icon indicating copy to clipboard operation
kubeblocks copied to clipboard

[BUG] mongodb restore datafile backup with large data failed

Open JashBook opened this issue 9 months ago • 0 comments

Describe the bug A clear and concise description of what the bug is.

kbcli version
Kubernetes: v1.30.6
KubeBlocks: 0.9.4-beta.4
kbcli: 0.9.4-beta.1

To Reproduce Steps to reproduce the behavior:

  1. create cluster
kubectl apply -f - <<EOF
apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
  name: mongodb-cluster
  namespace: default
spec:
  terminationPolicy: WipeOut
  componentSpecs:
  - name: mongodb
    componentDef: mongodb
    serviceVersion: 6.0.16
    replicas: 3
    resources:
      requests:
        cpu: 8
        memory: 16Gi
      limits:
        cpu: 16
        memory: 32Gi
    volumeClaimTemplates:
    - name: data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 2Ti
EOF
  1. load large date 8000000000+
  2. datafile backup and load data at the same time
➜  ~ kbcli cluster backup mongodb-cluster --method datafile
➜  ~ kubectl get backup backup-default-mongodb-cluster-20250324094728 
NAME                                            POLICY                                  METHOD     REPO                    STATUS      TOTAL-SIZE     DURATION   CREATION-TIME          COMPLETION-TIME        EXPIRATION-TIME
backup-default-mongodb-cluster-20250324094728   mongodb-cluster-mongodb-backup-policy   datafile   backuprepo-kbcli-test   Completed   168923946760   2h48m11s   2025-03-24T01:47:29Z   2025-03-24T04:35:41Z   
  1. restore
kbcli cluster restore mongodb-restore --backup backup-default-mongodb-cluster-20250324094728
  1. See error
 kubectl get ops -l app.kubernetes.io/instance=mongodb-restore    
NAME              TYPE      CLUSTER           STATUS    PROGRESS   AGE
mongodb-restore   Restore   mongodb-restore   Running   -/-        4h40m

kubectl get pod -l app.kubernetes.io/instance=mongodb-restore
NAME                        READY   STATUS    RESTARTS   AGE
mongodb-restore-mongodb-0   2/2     Running   0          136m
mongodb-restore-mongodb-1   2/2     Running   0          136m
mongodb-restore-mongodb-2   2/2     Running   0          136m

kubectl get cluster mongodb-restore
NAME              CLUSTER-DEFINITION   VERSION   TERMINATION-POLICY   STATUS     AGE
mongodb-restore                                  WipeOut              Creating   4h40m

kbcli cluster list-instances mongodb-restore
NAME                        NAMESPACE   CLUSTER           COMPONENT   STATUS    ROLE     ACCESSMODE   AZ   CPU(REQUEST/LIMIT)   MEMORY(REQUEST/LIMIT)   STORAGE    NODE                                              CREATED-TIME                 
mongodb-restore-mongodb-0   default     mongodb-restore   mongodb     Running   <none>   <none>       0    8 / 16               16Gi / 32Gi             data:2Ti   aks-backuplarge-28239716-vmss00000g/10.224.0.11   Mar 24,2025 19:44 UTC+0800   
mongodb-restore-mongodb-1   default     mongodb-restore   mongodb     Running   <none>   <none>       0    8 / 16               16Gi / 32Gi             data:2Ti   aks-backuplarge-28239716-vmss00000h/10.224.0.12   Mar 24,2025 19:45 UTC+0800   
mongodb-restore-mongodb-2   default     mongodb-restore   mongodb     Running   <none>   <none>       0    8 / 16               16Gi / 32Gi             data:2Ti   aks-backuplarge-28239716-vmss00000l/10.224.0.7    Mar 24,2025 19:45 UTC+0800  

logs pod

kubectl logs mongodb-restore-mongodb-0
Defaulted container "mongodb" out of: mongodb, lorry, init-syncer (init)
2025-03-24T11:45:07Z	DEBUG	Starting syncer	{"version": "83d705a6e46885297cbf93506e3519032efe13c9"}
2025-03-24T11:45:07Z	INFO	Initialize DB manager
2025-03-24T11:45:07Z	DEBUG	KB_WORKLOAD_TYPE ENV not set
2025-03-24T11:45:07Z	INFO	HTTPServer	Starting HTTP Server
2025-03-24T11:45:07Z	INFO	HTTPServer	API route path	{"method": "GET", "path": ["/v1.0/datasync", "/v1.0/getrole"]}
2025-03-24T11:45:07Z	INFO	HTTPServer	API route path	{"method": "POST", "path": ["/v1.0/rebuild", "/v1.0/start", "/v1.0/stop", "/v1.0/switchover"]}
2025-03-24T11:45:07Z	INFO	HA	HA starting
2025-03-24T11:45:07Z	INFO	pinger	Waiting for dns resolution to be ready
2025-03-24T11:45:07Z	INFO	HTTPServer	http server	{"listen address": "0.0.0.0", "port": 3601}
2025-03-24T11:45:07Z	INFO	pinger	dns resolution is ready	{"dns": "mongodb-restore-mongodb-0.mongodb-restore-mongodb-headless.default.svc"}
2025-03-24T11:45:07Z	INFO	DCS-K8S	pod selector: app.kubernetes.io/instance=mongodb-restore,app.kubernetes.io/managed-by=kubeblocks,apps.kubeblocks.io/component-name=mongodb
2025-03-24T11:45:07Z	INFO	DCS-K8S	podlist: 1
2025-03-24T11:45:07Z	INFO	DCS-K8S	Leader configmap is not found	{"configmap": "mongodb-restore-mongodb-leader"}
2025-03-24T11:45:07Z	DEBUG	HA	cluster info	{"cluster": {"ClusterCompName":"mongodb-restore-mongodb","Namespace":"default","Replicas":3,"HaConfig":{"DeleteMembers":null},"Leader":null,"Members":[{"Index":"","Name":"mongodb-restore-mongodb-0","Role":"","PodIP":"10.244.0.54","DBPort":"27017","SyncerPort":"3601","UID":"b4c19851-3de8-48b4-8aff-64594ddc2f27","ComponentName":"mongodb","UseIP":false}],"Switchover":null,"Extra":null,"Resource":{"metadata":{"name":"mongodb-restore","namespace":"default","uid":"49f603d0-b2ed-4364-90a3-c559dce279f1","resourceVersion":"62571254","generation":1,"creationTimestamp":"2025-03-24T09:21:19Z","annotations":{"kubeblocks.io/ops-request":"[{\"name\":\"mongodb-restore\",\"type\":\"Restore\"}]","kubeblocks.io/restore-from-backup":"{\"mongodb\":{\"doReadyRestoreAfterClusterRunning\":\"false\",\"encryptedSystemAccounts\":\"{\\\"root\\\":\\\"ndM92DGwOhMMmiUIY8f1+KyE8zSTUF3mqJvDjtMUQwB6jxpIfxcc0902MGs=\\\"}\",\"name\":\"backup-default-mongodb-cluster-20250324094728\",\"namespace\":\"default\",\"volumeRestorePolicy\":\"Parallel\"}}"},"finalizers":["cluster.kubeblocks.io/finalizer"],"managedFields":[{"manager":"kubeblocks","operation":"Update","apiVersion":"apps.kubeblocks.io/v1alpha1","time":"2025-03-24T09:21:20Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:kubeblocks.io/ops-request":{},"f:kubeblocks.io/restore-from-backup":{}},"f:finalizers":{".":{},"v:\"cluster.kubeblocks.io/finalizer\"":{}}},"f:spec":{".":{},"f:componentSpecs":{},"f:resources":{".":{},"f:cpu":{},"f:memory":{}},"f:storage":{".":{},"f:size":{}},"f:terminationPolicy":{}}}},{"manager":"kubeblocks","operation":"Update","apiVersion":"apps.kubeblocks.io/v1alpha1","time":"2025-03-24T11:44:58Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{".":{},"f:components":{".":{},"f:mongodb":{".":{},"f:phase":{},"f:podsReady":{}}},"f:conditions":{},"f:observedGeneration":{},"f:phase":{}}},"subresource":"status"}]},"spec":{"terminationPolicy":"WipeOut","componentSpecs":[{"name":"mongodb","componentDef":"mongodb","replicas":3,"resources":{"limits":{"cpu":"16","memory":"32Gi"},"requests":{"cpu":"8","memory":"16Gi"}},"volumeClaimTemplates":[{"name":"data","spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"2Ti"}}}}]}],"resources":{"cpu":"0","memory":"0"},"storage":{"size":"0"},"monitor":{}},"status":{"observedGeneration":1,"phase":"Creating","components":{"mongodb":{"phase":"Creating","podsReady":false}},"conditions":[{"type":"ProvisioningStarted","status":"True","observedGeneration":1,"lastTransitionTime":"2025-03-24T09:21:19Z","reason":"PreCheckSucceed","message":"The operator has started the provisioning of Cluster: mongodb-restore"},{"type":"ApplyResources","status":"True","observedGeneration":1,"lastTransitionTime":"2025-03-24T09:21:19Z","reason":"ApplyResourcesSucceed","message":"Successfully applied for resources"},{"type":"ReplicasReady","status":"False","lastTransitionTime":"2025-03-24T09:21:20Z","reason":"ReplicasNotReady","message":"pods are not ready in Components: [mongodb], refer to related component message in Cluster.status.components"},{"type":"Ready","status":"False","lastTransitionTime":"2025-03-24T09:21:20Z","reason":"ComponentsNotReady","message":"pods are unavailable in Components: [mongodb], refer to related component message in Cluster.status.components"}]}}}}
2025-03-24T11:45:07Z	INFO	Hypervisor	Starting Hypervisor
2025-03-24T11:45:07Z	INFO	Hypervisor	Start DB Service	{"command": "/scripts/replicaset-setup.sh /scripts/replicaset-setup.sh"}
2025-03-24T11:45:07Z	INFO	HA	check if DB Service is running
2025-03-24T11:45:07Z	INFO	HA	DB Service is running
2025-03-24T11:45:07Z	INFO	Hypervisor	Starting watcher on dbService
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
2025-03-24T11:45:08Z	INFO	MongoDB	Get replSet status failed	{"error": "replSetGetStatus: server selection error: context deadline exceeded, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: mongodb-restore-mongodb-0.mongodb-restore-mongodb-headless.default.svc:27017, Type: Unknown, Last error: dial tcp 10.244.0.54:27017: connect: connection refused }, ] }", "errorVerbose": "server selection error: context deadline exceeded, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: mongodb-restore-mongodb-0.mongodb-restore-mongodb-headless.default.svc:27017, Type: Unknown, Last error: dial tcp 10.244.0.54:27017: connect: connection refused }, ] }\nreplSetGetStatus\ngithub.com/apecloud/syncer/engines/mongodb.GetReplSetStatus\n\t/src/engines/mongodb/replset.go:35\ngithub.com/apecloud/syncer/engines/mongodb.(*Manager).IsClusterInitialized\n\t/src/engines/mongodb/manager.go:154\ngithub.com/apecloud/syncer/highavailability.(*Ha).Start\n\t/src/highavailability/ha.go:261\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"}
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
2025-03-24T11:45:38Z	INFO	MongoDB	Get replSet status with local unauth client failed	{"error": "server selection error: server selection timeout, current topology: { Type: Single, Servers: [{ Addr: localhost:27017, Type: Unknown, Last error: dial tcp [::1]:27017: connect: connection refused }, ] }"}
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
2025-03-24T11:46:08Z	INFO	MongoDB	Get replSet status with local auth client failed	{"error": "replSetGetStatus: server selection error: server selection timeout, current topology: { Type: Single, Servers: [{ Addr: localhost:27017, Type: Unknown, Last error: dial tcp [::1]:27017: connect: connection refused }, ] }"}
2025-03-24T11:46:08Z	INFO	HA	Waiting for the database cluster to be initialized.
2025-03-24T11:46:08Z	INFO	HA	Initialize the database cluster Failed.	{"error": "replSetGetStatus: server selection error: server selection timeout, current topology: { Type: Single, Servers: [{ Addr: localhost:27017, Type: Unknown, Last error: dial tcp [::1]:27017: connect: connection refused }, ] }"}
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027
2025-03-24T11:46:14Z	INFO	MongoDB	Get replSet status failed	{"error": "replSetGetStatus: server selection error: context deadline exceeded, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: mongodb-restore-mongodb-0.mongodb-restore-mongodb-headless.default.svc:27017, Type: Unknown, Last error: dial tcp 10.244.0.54:27017: connect: connection refused }, ] }", "errorVerbose": "server selection error: context deadline exceeded, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: mongodb-restore-mongodb-0.mongodb-restore-mongodb-headless.default.svc:27017, Type: Unknown, Last error: dial tcp 10.244.0.54:27017: connect: connection refused }, ] }\nreplSetGetStatus\ngithub.com/apecloud/syncer/engines/mongodb.GetReplSetStatus\n\t/src/engines/mongodb/replset.go:35\ngithub.com/apecloud/syncer/engines/mongodb.(*Manager).IsClusterInitialized\n\t/src/engines/mongodb/manager.go:154\ngithub.com/apecloud/syncer/highavailability.(*Ha).Start\n\t/src/highavailability/ha.go:277\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"}
== DB ERR == MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27027

logs lorry

➜  ~ kubectl logs mongodb-restore-mongodb-0 lorry
2025-03-24T11:45:07Z	INFO	Initialize DB manager
2025-03-24T11:45:07Z	INFO	KB_WORKLOAD_TYPE ENV not set
2025-03-24T11:45:07Z	INFO	Volume-Protection	succeed to init volume protection	{"pod": "mongodb-restore-mongodb-0", "spec": {"highWatermark":"0","volumes":[]}}
2025-03-24T11:45:07Z	INFO	HTTPServer	Starting HTTP Server
2025-03-24T11:45:07Z	INFO	HTTPServer	API route path	{"method": "POST", "path": ["/v1.0/datadump", "/v1.0/joinmember", "/v1.0/rebuild", "/v1.0/lockinstance", "/v1.0/preterminate", "/v1.0/checkrunning", "/v1.0/switchover", "/v1.0/leavemember", "/v1.0/revokeuserrole", "/v1.0/dataload", "/v1.0/unlockinstance", "/v1.0/deleteuser", "/v1.0/getlag", "/v1.0/createuser", "/v1.0/volumeprotection", "/v1.0/exec", "/v1.0/grantuserrole", "/v1.0/postprovision"]}
2025-03-24T11:45:07Z	INFO	HTTPServer	API route path	{"method": "GET", "path": ["/v1.0/getrole", "/v1.0/healthycheck", "/v1.0/query", "/v1.0/checkrole", "/v1.0/listusers", "/v1.0/describeuser", "/v1.0/listsystemaccounts"]}
2025-03-24T11:45:07Z	INFO	cronjobs	env is not set	{"env": "KB_CRON_JOBS"}
2025-03-24T11:45:17Z	INFO	MongoDB	DB is not ready	{"error": "server selection error: context deadline exceeded, current topology: { Type: Single, Servers: [{ Addr: localhost:27017, Type: Unknown, Last error: dial tcp [::1]:27017: connect: connection refused }, ] }"}
2025-03-24T11:45:18Z	INFO	MongoDB	DB is not ready	{"error": "server selection error: context deadline exceeded, current topology: { Type: Single, Servers: [{ Addr: localhost:27017, Type: Unknown, Last error: dial tcp [::1]:27017: connect: connection refused }, ] }"}
2025-03-24T11:45:20Z	INFO	MongoDB	DB is not ready	{"error": "server selection error: context deadline exceeded, current topology: { Type: Single, Servers: [{ Addr: localhost:27017, Type: Unknown, Last error: dial tcp [::1]:27017: connect: connection refused }, ] }"}
2025-03-24T11:45:22Z	INFO	MongoDB	DB is not ready	{"error": "server selection error: context deadline exceeded, current topology: { Type: Single, Servers: [{ Addr: localhost:27017, Type: Unknown, Last error: dial tcp [::1]:27017: connect: connection refused }, ] }"}
2025-03-24T11:45:24Z	INFO	MongoDB	DB is not ready	{"error": "server selection error: context deadline exceeded, current topology: { Type: Single, Servers: [{ Addr: localhost:27017, Type: Unknown, Last error: dial tcp [::1]:27017: connect: connection refused }, ] }"}
2025-03-24T11:45:26Z	INFO	MongoDB	DB is not ready	{"error": "server selection error: context deadline exceeded, current topology: { Type: Single, Servers: [{ Addr: localhost:27017, Type: Unknown, Last error: dial tcp [::1]:27017: connect: connection refused }, ] }"}
2025-03-24T11:45:28Z	INFO	MongoDB	DB is not ready	{"error": "server selection error: context deadline exceeded, current topology: { Type: Single, Servers: [{ Addr: localhost:27017, Type: Unknown, Last error: dial tcp [::1]:27017: connect: connection refused }, ] }"}

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional context Add any other context about the problem here.

JashBook avatar Mar 24 '25 08:03 JashBook