br copied to clipboard
BR: backup 1TB tpcc data stuck at 99.98%
Please answer these questions before submitting your issue. Thanks!
What did you do? If possible, provide a recipe for reproducing the error. br backup 1tb tpcc data to S3
What did you expect to see? backup successfully
What did you see instead? br stucked at 99.98% for hours and no logs updated anymore.
What version of BR and TiDB/TiKV/PD are you using?
$ ./bin/br -V
Release Version: v5.0.0-nightly-19-gb6591ca1
Git Commit Hash: b6591ca141530b7efebad292ed6811353883c100
Git Branch: release-5.0
Go Version: go1.16.2
UTC Build Time: 2021-03-29 06:31:38
Race Enabled: false
Operation logs
Please upload
for BR if possible br-1t-backup.log -
Please upload
for TiDB-Lightning if possible -
Please upload
from TiKV-Importer if possible -
Other interesting logs
Configuration of the cluster and the task
for TiDB-Lightning if possible -
for TiKV-Importer if possible -
if deployed by TiUP
Screenshot/exported-PDF of Grafana dashboard or metrics' graph in Prometheus if possible
Seems one call to backupClient.Recv stuck.
Goroutines waiting chain:
- for files := range filesCh (client.go:454) (the key!)
- for err := range errCh (client.go:485) (the key!)
- eg.Wait() (cilent.go:477) (waiting by 2)
- receiving backup stream (waiting by 3)
Possible reasons:
- Bug of gRPC, it made a deadlock in some condition (e.g. packet lost?).
- Bug of TiKV, it doesn't reply properly after finishing the backup.
- Other ghostlike things?
It didn't happen again while retrying last time. So it's not 100% users might hit.
TiKV log: br-stuck-tikv-log.tar.gz