br
br copied to clipboard
sysbench oltp_point_select test latency downgraded after br backup and restore
Please answer these questions before submitting your issue. Thanks!
-
What did you do? If possible, provide a recipe for reproducing the error.
-
Use lightning to import 9TB data to TiDB ( 1 databases, 3 tables, 3.75T / 2.5T / 2.5T for each table)
-
Use br to backup the database, and at the same time run sysbench oltp_point_select testing
-
Drop the tables, use br to restore the backup to original TiDB, and at the same time run sysbench oltp_point_select testing
-
After restore is finished, run sysbench oltp_point_select testing
-
Compare the sysbench testing results
-
What did you expect to see? There should be no big performance downgrade after br backup/restore.
-
What did you see instead? Comparing to sysbench result during backup and restore, avg and 95th percentile latency during after br restore downgraded a lot.
Latency(ms) backup restore after restore avg 0.28 0.28 0.53 max 320.15 200.79 268706.60 95th percentile 0.38 0.36 0.63
=== sysbench result during backup === sysbench --mysql-host=172.16.6.6 --mysql-port=4000 --mysql-user=root --config-file=config oltp_point_select --tables=10 --table-size=1000000 --time=36000 run SQL statistics: queries performed: read: 127086779 write: 0 other: 0 total: 127086779 transactions: 127086779 (3530.19 per sec.) queries: 127086779 (3530.19 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.)
General statistics: total time: 36000.0005s total number of events: 127086779
Latency (ms): min: 0.18 avg: 0.28 max: 320.15 95th percentile: 0.38 sum: 35927114.30
Threads fairness: events (avg/stddev): 127086779.0000/0.00 execution time (avg/stddev): 35927.1143/0.00 === sysbench result during restore === sysbench --mysql-host=172.16.6.6 --mysql-port=4000 --mysql-user=root --config-file=config oltp_point_select --tables=10 --table-size=1000000 --time=21600 run
config: No such file or directory sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)
Running the test with following options: Number of threads: 1 Initializing random number generator from current time
Initializing worker threads...
Threads started!
SQL statistics: queries performed: read: 76510850 write: 0 other: 0 total: 76510850 transactions: 76510850 (3542.17 per sec.) queries: 76510850 (3542.17 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.)
General statistics: total time: 21600.0007s total number of events: 76510850
Latency (ms): min: 0.17 avg: 0.28 max: 200.79 95th percentile: 0.36 sum: 21551168.55
Threads fairness: events (avg/stddev): 76510850.0000/0.00 execution time (avg/stddev): 21551.1685/0.00
=== sysbench result after restore === sysbench --mysql-host=172.16.6.6 --mysql-port=4000 --mysql-user=root --config-file=config oltp_point_select --tables=10 --table-size=1000000 --time=7200 run
SQL statistics: queries performed: read: 13656748 write: 0 other: 0 total: 13656748 transactions: 13656748 (1896.77 per sec.) queries: 13656748 (1896.77 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.)
General statistics: total time: 7200.0009s total number of events: 13656748
Latency (ms): min: 0.19 avg: 0.53 max: 268706.60 95th percentile: 0.63 sum: 7191942.64
Threads fairness: events (avg/stddev): 13656748.0000/0.00 execution time (avg/stddev): 7191.9426/0.00
- What version of BR and TiDB/TiKV/PD are you using?
br: v5.1.0-20210611 TiDB: v5.1.0-20210608
-
Operation logs
- Please upload
br.log
for BR if possible - Please upload
tidb-lightning.log
for TiDB-Lightning if possible - Please upload
tikv-importer.log
from TiKV-Importer if possible - Other interesting logs
- Please upload
-
Configuration of the cluster and the task
-
tidb-lightning.toml
for TiDB-Lightning if possible -
tikv-importer.toml
for TiKV-Importer if possible -
topology.yml
if deployed by TiUP
-
-
Screenshot/exported-PDF of Grafana dashboard or metrics' graph in Prometheus if possible
We suspect that the checksum action in restore process will flush the block cache in servers which cause oltp_point_select test latency increase. Further tests are needed.