tidb-lightning
tidb-lightning copied to clipboard
Exit with actual error log
Bug Report
Lighting fails and exits with a log, which is misleading
[2020/11/26 13:12:59.344 +00:00] [ERROR] [main.go:83] ["tidb lightning encountered error stack info"] [error="restore table `db`.`table` failed: [1fc7d0aa-07f9-5f0b-a4a9-18610333ab04] import reach max retry 3 and still failed: could not find first pair, this shouldn't happen"] [errorVerbose="could not find first pair, this shouldn't happen\ngithub.com/pingcap/tidb-lightning/lightning/backend.(*local).readAndSplitIntoRange\n\t/home/jenkins/agent/workspace/build_lightning_master/go/src/github.com/pingcap/tidb-lightning/lightning/backend/local.go:720\ngithub.com/pingcap/tidb-lightning/lightning/backend.(*local).ImportEngine\n\t/home/jenkins/agent/workspace/build_lightning_master/go/src/github.com/pingcap/tidb-lightning/lightning/backend/local.go:1111\ngithub.com/pingcap/tidb-lightning/lightning/backend.(*ClosedEngine).Import\n\t/home/jenkins/agent/workspace/build_lightning_master/go/src/github.com/pingcap/tidb-lightning/lightning/backend/backend.go:328\ngithub.com/pingcap/tidb-lightning/lightning/restore.(*TableRestore).importKV\n\t/home/jenkins/agent/workspace/build_lightning_master/go/src/github.com/pingcap/tidb-lightning/lightning/restore/restore.go:1604\ngithub.com/pingcap/tidb-lightning/lightning/restore.(*TableRestore).importEngine\n\t/home/jenkins/agent/workspace/build_lightning_master/go/src/github.com/pingcap/tidb-lightning/lightning/restore/restore.go:1140\ngithub.com/pingcap/tidb-lightning/lightning/restore.(*TableRestore).restoreEngines.func1\n\t/home/jenkins/agent/workspace/build_lightning_master/go/src/github.com/pingcap/tidb-lightning/lightning/restore/restore.go:943\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357\n[1fc7d0aa-07f9-5f0b-a4a9-18610333ab04] import reach max retry 3 and still failed\nrestore table `patsnapdata`.`t_workspace_folder_patent_1` failed"]
The actual error message is opening too many files
[2020/11/26 13:04:52.911 +00:00] [ERROR] [checkpoints.go:1089] ["save checkpoint failed"] [error="open /tmp/tidb_lightning_checkpoint.pb: too many open files"] [errorVerbose="open /tmp/tidb_lightning_checkpoint.pb: too many open files\ngithub.com/pingcap/errors.AddStack\n\t/home/jenkins/agent/workspace/build_lightning_master/go/pkg/mod/github.com/pingcap/[email protected]/errors.go:174\ngithub.com/pingcap/errors.Trace\n\t/home/jenkins/agent/workspace/build_lightning_master/go/pkg/mod/github.com/pingcap/[email protected]/juju_adaptor.go:15\ngithub.com/pingcap/tidb-lightning/lightning/checkpoints.(*FileCheckpointsDB).save\n\t/home/jenkins/agent/workspace/build_lightning_master/go/src/github.com/pingcap/tidb-lightning/lightning/checkpoints/checkpoints.go:890\ngithub.com/pingcap/tidb-lightning/lightning/checkpoints.(*FileCheckpointsDB).Update\n\t/home/jenkins/agent/workspace/build_lightning_master/go/src/github.com/pingcap/tidb-lightning/lightning/checkpoints/checkpoints.go:1088\ngithub.com/pingcap/tidb-lightning/lightning/restore.(*RestoreController).listenCheckpointUpdates.func1\n\t/home/jenkins/agent/workspace/build_lightning_master/go/src/github.com/pingcap/tidb-lightning/lightning/restore/restore.go:512\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"]
@overvenus could you paste the full log file?
The listenCheckpointUpdates
is an async action, and its error is always ignored. So I think the first error is the real cause of this failure (though the root cause should also be not enough open fd).
The issue is "could not find first pair, this shouldn't happen" should provide the underlying cause. Should be fixed by #497.