tidb-lightning icon indicating copy to clipboard operation
tidb-lightning copied to clipboard

Exit with actual error log

Open overvenus opened this issue 4 years ago • 2 comments

Bug Report

Lighting fails and exits with a log, which is misleading

[2020/11/26 13:12:59.344 +00:00] [ERROR] [main.go:83] ["tidb lightning encountered error stack info"] [error="restore table `db`.`table` failed: [1fc7d0aa-07f9-5f0b-a4a9-18610333ab04] import reach max retry 3 and still failed: could not find first pair, this shouldn't happen"] [errorVerbose="could not find first pair, this shouldn't happen\ngithub.com/pingcap/tidb-lightning/lightning/backend.(*local).readAndSplitIntoRange\n\t/home/jenkins/agent/workspace/build_lightning_master/go/src/github.com/pingcap/tidb-lightning/lightning/backend/local.go:720\ngithub.com/pingcap/tidb-lightning/lightning/backend.(*local).ImportEngine\n\t/home/jenkins/agent/workspace/build_lightning_master/go/src/github.com/pingcap/tidb-lightning/lightning/backend/local.go:1111\ngithub.com/pingcap/tidb-lightning/lightning/backend.(*ClosedEngine).Import\n\t/home/jenkins/agent/workspace/build_lightning_master/go/src/github.com/pingcap/tidb-lightning/lightning/backend/backend.go:328\ngithub.com/pingcap/tidb-lightning/lightning/restore.(*TableRestore).importKV\n\t/home/jenkins/agent/workspace/build_lightning_master/go/src/github.com/pingcap/tidb-lightning/lightning/restore/restore.go:1604\ngithub.com/pingcap/tidb-lightning/lightning/restore.(*TableRestore).importEngine\n\t/home/jenkins/agent/workspace/build_lightning_master/go/src/github.com/pingcap/tidb-lightning/lightning/restore/restore.go:1140\ngithub.com/pingcap/tidb-lightning/lightning/restore.(*TableRestore).restoreEngines.func1\n\t/home/jenkins/agent/workspace/build_lightning_master/go/src/github.com/pingcap/tidb-lightning/lightning/restore/restore.go:943\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357\n[1fc7d0aa-07f9-5f0b-a4a9-18610333ab04] import reach max retry 3 and still failed\nrestore table `patsnapdata`.`t_workspace_folder_patent_1` failed"]

The actual error message is opening too many files

[2020/11/26 13:04:52.911 +00:00] [ERROR] [checkpoints.go:1089] ["save checkpoint failed"] [error="open /tmp/tidb_lightning_checkpoint.pb: too many open files"] [errorVerbose="open /tmp/tidb_lightning_checkpoint.pb: too many open files\ngithub.com/pingcap/errors.AddStack\n\t/home/jenkins/agent/workspace/build_lightning_master/go/pkg/mod/github.com/pingcap/[email protected]/errors.go:174\ngithub.com/pingcap/errors.Trace\n\t/home/jenkins/agent/workspace/build_lightning_master/go/pkg/mod/github.com/pingcap/[email protected]/juju_adaptor.go:15\ngithub.com/pingcap/tidb-lightning/lightning/checkpoints.(*FileCheckpointsDB).save\n\t/home/jenkins/agent/workspace/build_lightning_master/go/src/github.com/pingcap/tidb-lightning/lightning/checkpoints/checkpoints.go:890\ngithub.com/pingcap/tidb-lightning/lightning/checkpoints.(*FileCheckpointsDB).Update\n\t/home/jenkins/agent/workspace/build_lightning_master/go/src/github.com/pingcap/tidb-lightning/lightning/checkpoints/checkpoints.go:1088\ngithub.com/pingcap/tidb-lightning/lightning/restore.(*RestoreController).listenCheckpointUpdates.func1\n\t/home/jenkins/agent/workspace/build_lightning_master/go/src/github.com/pingcap/tidb-lightning/lightning/restore/restore.go:512\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"]

overvenus avatar Nov 27 '20 12:11 overvenus

@overvenus could you paste the full log file?

The listenCheckpointUpdates is an async action, and its error is always ignored. So I think the first error is the real cause of this failure (though the root cause should also be not enough open fd).

glorv avatar Nov 30 '20 03:11 glorv

The issue is "could not find first pair, this shouldn't happen" should provide the underlying cause. Should be fixed by #497.

kennytm avatar Nov 30 '20 09:11 kennytm