tidb
tidb copied to clipboard
executor: disable closest replica read if cluster is not balanced
What problem does this PR solve?
Issue Number: ref #35926
Problem Summary:
What is changed and how it works?
#35927 has introduce a new replica_read type closest-replica
which will dispatch read request to the store within the same AZ. But in this mode, if the read traffic is not evenly distributed across AZs, it may cause unbalanced load in tikv and affect the overall performance.
This PR alleviate this problem by add a periodically check when closest-replica
is enabled. Every 60s, it checks that all AZs contain both tidb and tikv instances, if not it will disables closest-replica
and fallback to leader read. In this simple way, we can avoid the traffic skew in most cases.
NOTE: in my benchmark, there are two problems that may affect the effect of this check:
- The check depends on
infosync.GetAllServerInfo
to fetch all active tidb instances. This information is stored in etcd with a ttl of 10min. Because of #36793, tidb is likely to panic at exit and can't delete self from the etcd which may cause misjudgement. - If the app uses long conneciton, when a new tidb is up, the traffic can't be dispatch to it because exist connection can't be moved. Thus though the cluster itself is even, but the traffic is still not even. This PR can't hanle this kind of cases.
Check List
Tests
- [x] Unit test
- [ ] Integration test
- [x] Manual test (add detailed scripts or steps below)
- [ ] No code
Side effects
- [ ] Performance regression: Consumes more CPU
- [ ] Performance regression: Consumes more Memory
- [ ] Breaking backward compatibility
Documentation
- [ ] Affects user behaviors
- [ ] Contains syntax changes
- [ ] Contains variable changes
- [ ] Contains experimental features
- [ ] Changes MySQL compatibility
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
None
[REVIEW NOTIFICATION]
This pull request has been approved by:
- nolouch
- qw4990
To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer
in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer
in the comment to help you merge this pull request.
The full list of commands accepted by this bot can be found here.
Reviewer can indicate their review by submitting an approval review. Reviewer can cancel approval by submitting a request changes review.
@qw4990 @nolouch PTAL, thank you
Code Coverage Details: https://codecov.io/github/pingcap/tidb/commit/0e349a3bc827fbc192fee308c42ad4d5e23b49eb
/build
@nolouch @qw4990 Could you please take a look? The related tidb-operator PR is merged now.
@qw4990 @winoros PTAL
/merge
This pull request has been accepted and is ready to merge.
TiDB MergeCI notify
🔴 Bad News! New failing [2] after this pr merged. These new failed integration tests seem to be caused by the current PR, please try to fix these new failed integration tests, thanks!
CI Name | Result | Duration | Compare with Parent commit |
---|---|---|---|
idc-jenkins-ci-tidb/integration-common-test | 🟥 failed 3, success 14, total 17 | 19 min | New failing |
idc-jenkins-ci-tidb/integration-ddl-test | 🟥 failed 1, success 5, total 6 | 5 min 18 sec | New failing |
idc-jenkins-ci/integration-cdc-test | 🟢 all 37 tests passed | 32 min | Existing passed |
idc-jenkins-ci-tidb/common-test | 🟢 all 11 tests passed | 20 min | Existing passed |
idc-jenkins-ci-tidb/tics-test | 🟢 all 1 tests passed | 6 min 44 sec | Existing passed |
idc-jenkins-ci-tidb/sqllogic-test-2 | 🟢 all 28 tests passed | 5 min 32 sec | Existing passed |
idc-jenkins-ci-tidb/sqllogic-test-1 | 🟢 all 26 tests passed | 4 min 34 sec | Existing passed |
idc-jenkins-ci-tidb/mybatis-test | 🟢 all 1 tests passed | 3 min 55 sec | Existing passed |
idc-jenkins-ci-tidb/integration-compatibility-test | 🟢 all 1 tests passed | 3 min 22 sec | Existing passed |
idc-jenkins-ci-tidb/plugin-test | 🟢 build success, plugin test success | 4min | Existing passed |
There is a goleak found related to br:
[2022-09-07T03:39:52.041Z] goleak: Errors on successful test run: found unexpected goroutines:
[2022-09-07T03:39:52.041Z] [Goroutine 16818 in state select, with go.etcd.io/etcd/client/v3.waitRetryBackoff on top of the stack:
[2022-09-07T03:39:52.041Z] goroutine 16818 [select]:
[2022-09-07T03:39:52.041Z] go.etcd.io/etcd/client/v3.waitRetryBackoff({0x422dcb8, 0xc000bd9c80}, 0x4?, 0xc001e78600?)
[2022-09-07T03:39:52.041Z] /go/pkg/mod/go.etcd.io/etcd/client/[email protected]/retry_interceptor.go:302 +0xa5
[2022-09-07T03:39:52.041Z] go.etcd.io/etcd/client/v3.(*Client).unaryClientInterceptor.func1({0x422dc80?, 0xc00243c300?}, {0x3cb51a1, 0x16}, {0x3ba3260, 0xc002318000}, {0x3b5bf20, 0xc002256050}, 0xc000d3c000, 0x3dab8b8, ...)
[2022-09-07T03:39:52.041Z] /go/pkg/mod/go.etcd.io/etcd/client/[email protected]/retry_interceptor.go:50 +0x1fa
[2022-09-07T03:39:52.041Z] google.golang.org/grpc.(*ClientConn).Invoke(0x60?, {0x422dc80?, 0xc00243c300?}, {0x3cb51a1?, 0x6?}, {0x3ba3260?, 0xc002318000?}, {0x3b5bf20?, 0xc002256050?}, {0xc00243c360, ...})
[2022-09-07T03:39:52.041Z] /go/pkg/mod/google.golang.org/[email protected]/call.go:35 +0x223
[2022-09-07T03:39:52.041Z] go.etcd.io/etcd/api/v3/etcdserverpb.(*kVClient).Range(0xc000fb9fb0, {0x422dc80, 0xc00243c300}, 0xc002318000?, {0xc00243c360, 0x4, 0x6})
[2022-09-07T03:39:52.041Z] /go/pkg/mod/go.etcd.io/etcd/api/[email protected]/etcdserverpb/rpc.pb.go:6460 +0xc9
[2022-09-07T03:39:52.041Z] go.etcd.io/etcd/client/v3.(*retryKVClient).Range(0xc0021a3200, {0x422dc80, 0xc00243c300}, 0x691a80?, {0x6408be0, 0x3, 0x3})
[2022-09-07T03:39:52.041Z] /go/pkg/mod/go.etcd.io/etcd/client/[email protected]/retry.go:105 +0x133
[2022-09-07T03:39:52.041Z] go.etcd.io/etcd/client/v3.(*kv).Do(0xc001252d80, {_, _}, {0x1, {0xc0017be120, 0x11, 0x18}, {0xc0017be138, 0x11, 0x11}, ...})
[2022-09-07T03:39:52.041Z] /go/pkg/mod/go.etcd.io/etcd/client/[email protected]/kv.go:149 +0x1e8
[2022-09-07T03:39:52.041Z] go.etcd.io/etcd/client/v3.(*kv).Get(0x422dc48?, {0x422dc80, 0xc00243c300}, {0x3ca2f12?, 0x2?}, {0xc001502370?, 0x0?, 0x0?})
[2022-09-07T03:39:52.041Z] /go/pkg/mod/go.etcd.io/etcd/client/[email protected]/kv.go:119 +0xdc
[2022-09-07T03:39:52.041Z] github.com/pingcap/tidb/domain/infosync.getInfo({0x422dc48?, 0xc000328000?}, 0xc001818e00, {0x3ca2f12, 0x11}, 0x5, 0x30?, {0xc001502370, 0x1, 0x1})
[2022-09-07T03:39:52.041Z] /home/jenkins/agent/workspace/tidb_ghpr_integration_ddl_test/go/src/github.com/pingcap/tidb/domain/infosync/info.go:938 +0x17e
[2022-09-07T03:39:52.041Z] github.com/pingcap/tidb/domain/infosync.(*InfoSyncer).getAllServerInfo(0xc0025ce1c0, {0x422dc48, 0xc000328000})
[2022-09-07T03:39:52.042Z] /home/jenkins/agent/workspace/tidb_ghpr_integration_ddl_test/go/src/github.com/pingcap/tidb/domain/infosync/info.go:594 +0xc7
[2022-09-07T03:39:52.042Z] github.com/pingcap/tidb/domain/infosync.GetAllServerInfo({0x422dc48, 0xc000328000})
[2022-09-07T03:39:52.042Z] /home/jenkins/agent/workspace/tidb_ghpr_integration_ddl_test/go/src/github.com/pingcap/tidb/domain/infosync/info.go:341 +0x45
[2022-09-07T03:39:52.042Z] github.com/pingcap/tidb/infoschema.GetTiDBServerInfo({0xc001f88ad0?, 0x33afad9?})
[2022-09-07T03:39:52.042Z] /home/jenkins/agent/workspace/tidb_ghpr_integration_ddl_test/go/src/github.com/pingcap/tidb/infoschema/tables.go:1647 +0x3c
[2022-09-07T03:39:52.042Z] github.com/pingcap/tidb/infoschema.GetClusterServerInfo({0x4292410, 0xc001b79b80})
[2022-09-07T03:39:52.042Z] /home/jenkins/agent/workspace/tidb_ghpr_integration_ddl_test/go/src/github.com/pingcap/tidb/infoschema/tables.go:1632 +0xf9
[2022-09-07T03:39:52.042Z] github.com/pingcap/tidb/executor.fetchClusterConfig({0x4292410, 0xc001b79b80}, 0xc001f88e88, 0xc001f88e88)
[2022-09-07T03:39:52.042Z] /home/jenkins/agent/workspace/tidb_ghpr_integration_ddl_test/go/src/github.com/pingcap/tidb/executor/memtable_reader.go:170 +0x70
[2022-09-07T03:39:52.042Z] github.com/pingcap/tidb/executor.(*ShowExec).fetchShowClusterConfigs(0xc00115e840, {0x0?, 0x400?})
[2022-09-07T03:39:52.042Z] /home/jenkins/agent/workspace/tidb_ghpr_integration_ddl_test/go/src/github.com/pingcap/tidb/executor/show.go:1253 +0x11e
[2022-09-07T03:39:52.042Z] github.com/pingcap/tidb/executor.(*ShowExec).fetchAll(0x4231ee0?, {0x422dcb8?, 0xc001bd89f0?})
[2022-09-07T03:39:52.042Z] /home/jenkins/agent/workspace/tidb_ghpr_integration_ddl_test/go/src/github.com/pingcap/tidb/executor/show.go:150 +0x18c
[2022-09-07T03:39:52.042Z] github.com/pingcap/tidb/executor.(*ShowExec).Next(0xc00115e840, {0x422dcb8, 0xc001bd89f0}, 0xc0007a64b0)
[2022-09-07T03:39:52.042Z] /home/jenkins/agent/workspace/tidb_ghpr_integration_ddl_test/go/src/github.com/pingcap/tidb/executor/show.go:115 +0xc8
[2022-09-07T03:39:52.042Z] github.com/pingcap/tidb/executor.Next({0x422dcb8, 0xc001bd89f0}, {0x4231ee0, 0xc00115e840}, 0xc0007a64b0)
[2022-09-07T03:39:52.042Z] /home/jenkins/agent/workspace/tidb_ghpr_integration_ddl_test/go/src/github.com/pingcap/tidb/executor/executor.go:324 +0x4f2
[2022-09-07T03:39:52.042Z] github.com/pingcap/tidb/executor.(*SelectionExec).Next(0xc001eb8410, {0x422dcb8, 0xc001bd89f0}, 0xc0007a6640)
[2022-09-07T03:39:52.042Z] /home/jenkins/agent/workspace/tidb_ghpr_integration_ddl_test/go/src/github.com/pingcap/tidb/executor/executor.go:1560 +0xf7
[2022-09-07T03:39:52.042Z] github.com/pingcap/tidb/executor.Next({0x422dcb8, 0xc001bd89f0}, {0x4231d20, 0xc001eb8410}, 0xc0007a6640)
[2022-09-07T03:39:52.042Z] /home/jenkins/agent/workspace/tidb_ghpr_integration_ddl_test/go/src/github.com/pingcap/tidb/executor/executor.go:324 +0x4f2
[2022-09-07T03:39:52.042Z] github.com/pingcap/tidb/executor.(*ExecStmt).next(0xc002926870, {0x422dcb8, 0xc001bd89f0}, {0x4231d20, 0xc001eb8410}, 0xc0006fa800?)
[2022-09-07T03:39:52.042Z] /home/jenkins/agent/workspace/tidb_ghpr_integration_ddl_test/go/src/github.com/pingcap/tidb/executor/adapter.go:937 +0x78
[2022-09-07T03:39:52.042Z] github.com/pingcap/tidb/executor.(*recordSet).Next(0xc0007a65f0, {0x422dcb8?, 0xc001bd89f0?}, 0xc0007a6640)
[2022-09-07T03:39:52.042Z] /home/jenkins/agent/workspace/tidb_ghpr_integration_ddl_test/go/src/github.com/pingcap/tidb/executor/adapter.go:152 +0xc5
[2022-09-07T03:39:52.042Z] github.com/pingcap/tidb/session.drainRecordSet({0x422dcb8, 0xc001bd89f0}, 0xc001b79b80, {0x422e540, 0xc001bd93b0}, {0x0?, 0x0?})
[2022-09-07T03:39:52.042Z] /home/jenkins/agent/workspace/tidb_ghpr_integration_ddl_test/go/src/github.com/pingcap/tidb/session/session.go:1284 +0xea
[2022-09-07T03:39:52.042Z] github.com/pingcap/tidb/session.(*session).ExecRestrictedSQL.func1({0x422dcb8, 0xc001bd8990}, 0xc001b79b80)
[2022-09-07T03:39:52.042Z] /home/jenkins/agent/workspace/tidb_ghpr_integration_ddl_test/go/src/github.com/pingcap/tidb/session/session.go:1940 +0x2f7
[2022-09-07T03:39:52.042Z] github.com/pingcap/tidb/session.(*session).withRestrictedSQLExecutor(0x38469e0?, {0x422dcb8, 0xc001bd8990}, {0x0, 0x0, 0xc000328000?}, 0xc0013b38c0)
[2022-09-07T03:39:52.042Z] /home/jenkins/agent/workspace/tidb_ghpr_integration_ddl_test/go/src/github.com/pingcap/tidb/session/session.go:1913 +0x2e8
[2022-09-07T03:39:52.042Z] github.com/pingcap/tidb/session.(*session).ExecRestrictedSQL(0xafef70f690bb78f5?, {0x422dcb8?, 0xc001bd8990?}, {0x0?, 0x0?, 0x0?}, {0x3d082b1?, 0xc000ef7440?}, {0x0, 0x0, ...})
[2022-09-07T03:39:52.042Z] /home/jenkins/agent/workspace/tidb_ghpr_integration_ddl_test/go/src/github.com/pingcap/tidb/session/session.go:1917 +0x8e
[2022-09-07T03:39:52.042Z] github.com/pingcap/tidb/br/pkg/utils.IsLogBackupEnabled({0x7fa21c329cd8, 0xc001b78780})
[2022-09-07T03:39:52.042Z] /home/jenkins/agent/workspace/tidb_ghpr_integration_ddl_test/go/src/github.com/pingcap/tidb/br/pkg/utils/db.go:72 +0xa2
[2022-09-07T03:39:52.042Z] github.com/pingcap/tidb/br/pkg/utils.CheckLogBackupEnabled({0x4292410?, 0xc001b78780?})
[2022-09-07T03:39:52.042Z] /home/jenkins/agent/workspace/tidb_ghpr_integration_ddl_test/go/src/github.com/pingcap/tidb/br/pkg/utils/db.go:54 +0x56
[2022-09-07T03:39:52.042Z] github.com/pingcap/tidb/store/gcworker.(*GCWorker).checkLeader(0xc000d28000, {0x422dcb8, 0xc00265f530})
[2022-09-07T03:39:52.042Z] /home/jenkins/agent/workspace/tidb_ghpr_integration_ddl_test/go/src/github.com/pingcap/tidb/store/gcworker/gc_worker.go:1793 +0x12f
[2022-09-07T03:39:52.042Z] github.com/pingcap/tidb/store/gcworker.(*GCWorker).tick(0xc0013b3e60?, {0x422dcb8, 0xc00265f530})
[2022-09-07T03:39:52.042Z] /home/jenkins/agent/workspace/tidb_ghpr_integration_ddl_test/go/src/github.com/pingcap/tidb/store/gcworker/gc_worker.go:286 +0x45
[2022-09-07T03:39:52.042Z] github.com/pingcap/tidb/store/gcworker.(*GCWorker).start(0xc000d28000, {0x422dcb8, 0xc00265f530}, 0xc000328008?)
[2022-09-07T03:39:52.042Z] /home/jenkins/agent/workspace/tidb_ghpr_integration_ddl_test/go/src/github.com/pingcap/tidb/store/gcworker/gc_worker.go:229 +0x4e5
[2022-09-07T03:39:52.042Z] created by github.com/pingcap/tidb/store/gcworker.(*GCWorker).Start
[2022-09-07T03:39:52.042Z] /home/jenkins/agent/workspace/tidb_ghpr_integration_ddl_test/go/src/github.com/pingcap/tidb/store/gcworker/gc_worker.go:120 +0x118
[2022-09-07T03:39:52.042Z]
[2022-09-07T03:39:52.042Z] ]
/cc @3pointer https://ci.pingcap.net/blue/organizations/jenkins/tidb_ghpr_integration_ddl_test/detail/tidb_ghpr_integration_ddl_test/11277/pipeline