milvus-backup icon indicating copy to clipboard operation
milvus-backup copied to clipboard

[Bug]: milvus-backup backup is successful, but restoration from the backup is failing

Open fengchen8556203 opened this issue 8 months ago • 8 comments

Current Behavior

milvus-backup check Succeed to connect to milvus and storage. Milvus version: 2.5.11 Storage: milvus-bucket: milvus-bucket milvus-rootpath: file backup-bucket: a-bucket backup-rootpath: backup mivlus-backup:0.5.4

Command Execution: When restoring the collection using: ./milvus-backup restore -n BookEmbeddingByXml_time_2025_04_30_17_39_58 -s _bak

Observed Behavior: The process gets stuck at this point:

Log Output: [2025/04/30 17:42:05.992 +08:00] [INFO] [restore/collection.go:968] ["bulk insert task status"] [backup_db_name=default] [backup_collection_name=BookEmbeddingByXml] [target_db_name=default] [target_collection_name=BookEmbeddingByXml_bak] [jobID=457681197306260455] [state=ImportStarted] [backup="[{"key":"failed_reason"},{"key":"progress_percent","value":"70"}]"]

Expected Behavior

No response

Steps To Reproduce


Environment


Anything else?

No response

fengchen8556203 avatar Apr 30 '25 09:04 fengchen8556203

@fengchen8556203 Could you check if the storage (object storage, i.e. MinIO) is full or some drives are offline?

Andy6132024 avatar Apr 30 '25 10:04 Andy6132024

Everything checked out fine, but despite the long wait, it took significantly more time to finish than the last version (2.3.11)

fengchen8556203 avatar May 06 '25 00:05 fengchen8556203

Based on the logs, the restore task is stuck at 70% with the status ImportStarted, which usually indicates that the data import phase has completed, and the system is now building indexes. Please check the logs of Milvus's datanode and indexnode to see if there are any errors or issues related to index building, as well as their load conditions during this process.

If the logs indicate any specific errors or unusual load, feel free to share them with us so we can assist you further.

huanghaoyuanhhy avatar May 07 '25 02:05 huanghaoyuanhhy

I encountered the same problem, is there a solution for this problem?

leejoyful avatar Sep 01 '25 05:09 leejoyful

No exception logs are obtained in Milvus's datanode and indexnode.If it is in the index creation state, this state lasts too long.

leejoyful avatar Sep 05 '25 03:09 leejoyful

The restoration of 20 million records failed. The following is the error message:

[2025/09/05 14:12:33.613 +08:00] [ERROR] [core/backup_impl_restore_backup.go:114] ["execute restore collection fail"] [backupId=f2574991-8967-11f0-ae4a-7cc25575c23c] [error="backup: restore backup task execute fail, err: restore: execute restore: run collection task restore: wait collection worker pool restore: restore collection restore_collection: restore data: restore_collection: restore data v1: restore_collection: restore partition data v1: restore_collection: restore not L0 groups: restore_collection: restore not L0 segment v1: restore_collection: bulk insert failed: context deadline exceeded"] [stack="github.com/zilliztech/milvus-backup/core.(*BackupContext).RestoreBackup\n\t/home/runner/work/milvus-backup/milvus-backup/core/backup_impl_restore_backup.go:114\ngithub.com/zilliztech/milvus-backup/cmd/restore.(*options).run\n\t/home/runner/work/milvus-backup/milvus-backup/cmd/restore/restore.go:133\ngithub.com/zilliztech/milvus-backup/cmd/restore.NewCmd.func1\n\t/home/runner/work/milvus-backup/milvus-backup/cmd/restore/restore.go:161\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:1015\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:1148\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:1071\ngithub.com/zilliztech/milvus-backup/cmd.Execute\n\t/home/runner/work/milvus-backup/milvus-backup/cmd/cmd.go:37\nmain.main\n\t/home/runner/work/milvus-backup/milvus-backup/main.go:20\nruntime.main\n\t/opt/hostedtoolcache/go/1.24.4/x64/src/runtime/proc.go:283"] Error: restore backup failed: backup: restore backup task execute fail, err: restore: execute restore: run collection task restore: wait collection worker pool restore: restore collection restore_collection: restore data: restore_collection: restore data v1: restore_collection: restore partition data v1: restore_collection: restore not L0 groups: restore_collection: restore not L0 segment v1: restore_collection: bulk insert failed: context deadline exceeded

leejoyful avatar Sep 05 '25 06:09 leejoyful

I'm using milvus-backup v0.5.7. My milvus cluster is v2.5.6. Both are containerized. I also encountered the same issue.

Error: restore backup failed: backup: restore backup task execute fail, err: restore: execute restore: run collection task restore: wait collection worker pool restore: restore collection restore_collection: restore data: restore_collection: restore data v1: restore_collection: restore partition data v1: restore_collection: restore not L0 groups: restore_collection: restore not L0 segment v1: restore_collection: bulk insert failed: segment is not healthy

leejoyful avatar Sep 08 '25 01:09 leejoyful

The following error will also occur: [2025/09/08 19:12:12.912 +08:00] [INFO] [restore/collection.go:728] ["bulk insert task state"] [restore_task_id=721e6d68-ab07-48e9-a157-5da5a2aa9100] [backup_ns=default.OpsMM_1536] [target_ns=default.OpsXX_1536] [jobID=460664905050889866] [state=ImportPending] [backup="[{"key":"failed_reason"},{"key":"progress_percent","value":"10"}]"]

Then it fails to run: [2025/09/08 19:38:43.254 +08:00] [ERROR] [errgroup/errgroup.go:130] ["restore coll failed"] [backup_name=backup0902test] [backup_path=backup/backup0902test] [target_ns=default.OpsMM_1536] [error="restore_collection: restore data: restore_collection: restore data v1: restore_collection: restore partition data v1: restore_collection: restore not L0 groups: restore_collection: restore not L0 segment v1: restore_collection: bulk insert failed: segment is not healthy"] [stack="golang.org/x/sync/errgroup.(*Group).add.func1\n\t/home/runner/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:130"] [2025/09/08 19:38:43.254 +08:00] [ERROR] [core/backup_impl_restore_backup.go:158] ["restore task failed"] [backup_name=backup0902test] [backup_path=backup/backup0902test] [error="restore: run collection task restore: wait collection worker pool restore: restore collection restore_collection: restore data: restore_collection: restore data v1: restore_collection: restore partition data v1: restore_collection: restore not L0 groups: restore_collection: restore not L0 segment v1: restore_collection: bulk insert failed: segment is not healthy"] [stack="github.com/zilliztech/milvus-backup/core.(*BackupContext).executeRestoreBackupTask\n\t/home/runner/work/milvus-backup/milvus-backup/core/backup_impl_restore_backup.go:158\ngithub.com/zilliztech/milvus-backup/core.(*BackupContext).RestoreBackup\n\t/home/runner/work/milvus-backup/milvus-backup/core/backup_impl_restore_backup.go:111\ngithub.com/zilliztech/milvus-backup/cmd/restore.(*options).run\n\t/home/runner/work/milvus-backup/milvus-backup/cmd/restore/restore.go:133\ngithub.com/zilliztech/milvus-backup/cmd/restore.NewCmd.func1\n\t/home/runner/work/milvus-backup/milvus-backup/cmd/restore/restore.go:161\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:1015\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:1148\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:1071\ngithub.com/zilliztech/milvus-backup/cmd.Execute\n\t/home/runner/work/milvus-backup/milvus-backup/cmd/cmd.go:37\nmain.main\n\t/home/runner/work/milvus-backup/milvus-backup/main.go:20\nruntime.main\n\t/opt/hostedtoolcache/go/1.24.4/x64/src/runtime/proc.go:283"] [2025/09/08 19:38:43.255 +08:00] [ERROR] [core/backup_impl_restore_backup.go:114] ["execute restore collection fail"] [backupId=31abf5c3-87ef-11f0-a504-7cc25575c23c] [error="backup: restore backup task execute fail, err: restore: execute restore: run collection task restore: wait collection worker pool restore: restore collection restore_collection: restore data: restore_collection: restore data v1: restore_collection: restore partition data v1: restore_collection: restore not L0 groups: restore_collection: restore not L0 segment v1: restore_collection: bulk insert failed: segment is not healthy"] [stack="github.com/zilliztech/milvus-backup/core.(*BackupContext).RestoreBackup\n\t/home/runner/work/milvus-backup/milvus-backup/core/backup_impl_restore_backup.go:114\ngithub.com/zilliztech/milvus-backup/cmd/restore.(*options).run\n\t/home/runner/work/milvus-backup/milvus-backup/cmd/restore/restore.go:133\ngithub.com/zilliztech/milvus-backup/cmd/restore.NewCmd.func1\n\t/home/runner/work/milvus-backup/milvus-backup/cmd/restore/restore.go:161\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:1015\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:1148\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:1071\ngithub.com/zilliztech/milvus-backup/cmd.Execute\n\t/home/runner/work/milvus-backup/milvus-backup/cmd/cmd.go:37\nmain.main\n\t/home/runner/work/milvus-backup/milvus-backup/main.go:20\nruntime.main\n\t/opt/hostedtoolcache/go/1.24.4/x64/src/runtime/proc.go:283"] Error: restore backup failed: backup: restore backup task execute fail, err: restore: execute restore: run collection task restore: wait collection worker pool restore: restore collection restore_collection: restore data: restore_collection: restore data v1: restore_collection: restore partition data v1: restore_collection: restore not L0 groups: restore_collection: restore not L0 segment v1: restore_collection: bulk insert failed: segment is not healthy

leejoyful avatar Sep 08 '25 11:09 leejoyful