br icon indicating copy to clipboard operation
br copied to clipboard

BR: backup 1TB tpcc data stuck at 99.98%

Open cyliu0 opened this issue 3 years ago • 4 comments

Please answer these questions before submitting your issue. Thanks!

  1. What did you do? If possible, provide a recipe for reproducing the error. br backup 1tb tpcc data to S3

  2. What did you expect to see? backup successfully

  3. What did you see instead? br stucked at 99.98% for hours and no logs updated anymore. image

  4. What version of BR and TiDB/TiKV/PD are you using?

$ ./bin/br -V
Release Version: v5.0.0-nightly-19-gb6591ca1
Git Commit Hash: b6591ca141530b7efebad292ed6811353883c100
Git Branch: release-5.0
Go Version: go1.16.2
UTC Build Time: 2021-03-29 06:31:38
Race Enabled: false
  1. Operation logs

    • Please upload br.log for BR if possible br-1t-backup.log

    • Please upload tidb-lightning.log for TiDB-Lightning if possible

    • Please upload tikv-importer.log from TiKV-Importer if possible

    • Other interesting logs

  2. Configuration of the cluster and the task

    • tidb-lightning.toml for TiDB-Lightning if possible
    • tikv-importer.toml for TiKV-Importer if possible
    • topology.yml if deployed by TiUP
  3. Screenshot/exported-PDF of Grafana dashboard or metrics' graph in Prometheus if possible

cyliu0 avatar Mar 30 '21 07:03 cyliu0

Goroutine dumping:

stuck-goroutines.txt

YuJuncen avatar Mar 30 '21 07:03 YuJuncen

Seems one call to backupClient.Recv stuck.

Goroutines waiting chain:

  1. for files := range filesCh (client.go:454) (the key!)
  2. for err := range errCh (client.go:485) (the key!)
  3. eg.Wait() (cilent.go:477) (waiting by 2)
  4. receiving backup stream (waiting by 3)

Possible reasons:

  • Bug of gRPC, it made a deadlock in some condition (e.g. packet lost?).
  • Bug of TiKV, it doesn't reply properly after finishing the backup.
  • Other ghostlike things?

YuJuncen avatar Mar 30 '21 07:03 YuJuncen

It didn't happen again while retrying last time. So it's not 100% users might hit.

cyliu0 avatar Mar 31 '21 02:03 cyliu0

TiKV log: br-stuck-tikv-log.tar.gz

YuJuncen avatar Apr 06 '21 10:04 YuJuncen