pd icon indicating copy to clipboard operation
pd copied to clipboard

goroutine blocked when get tso

Open Lily2025 opened this issue 9 months ago • 3 comments

Bug Report

What did you do?

1、enable tidb_enable_tso_follower_proxy

Image

2、enable enable-forwarding "tidb": "enable-forwarding = true", "tikv": "[pd]\nenable-forwarding = true" 2、run tpcc 3、after run some ha cases Image

What did you expect to see?

workload can run normally

What did you see instead?

after run some ha cases,query blocked Image

#	0x2b47e9c	[github.com/tikv/pd/client/clients/tso.(*Request).waitCtx+0x1bc](http://github.com/tikv/pd/client/clients/tso.(*Request).waitCtx+0x1bc)					/root/go/pkg/mod/[github.com/tikv/pd/[email protected]/clients/tso/request.go:82](http://github.com/tikv/pd/[email protected]/clients/tso/request.go:82)
#	0x2b47cb9	[github.com/tikv/pd/client/clients/tso.(*Request).Wait+0x19](http://github.com/tikv/pd/client/clients/tso.(*Request).Wait+0x19)					/root/go/pkg/mod/[github.com/tikv/pd/[email protected]/clients/tso/request.go:73](http://github.com/tikv/pd/[email protected]/clients/tso/request.go:73)
#	0x2b7494e	[github.com/tikv/client-go/v2/util.interceptedTsFuture.Wait+0x6e](http://github.com/tikv/client-go/v2/util.interceptedTsFuture.Wait+0x6e)					/root/go/pkg/mod/[github.com/tikv/client-go/[email protected]/util/pd_interceptor.go:76](http://github.com/tikv/client-go/[email protected]/util/pd_interceptor.go:76)
#	0x2ce2f61	[github.com/tikv/client-go/v2/oracle/oracles.(*tsFuture).Wait+0x41](http://github.com/tikv/client-go/v2/oracle/oracles.(*tsFuture).Wait+0x41)				/root/go/pkg/mod/[github.com/tikv/client-go/[email protected]/oracle/oracles/pd.go:238](http://github.com/tikv/client-go/[email protected]/oracle/oracles/pd.go:238)
#	0x5b39819	[github.com/pingcap/tidb/pkg/session.(*txnFuture).wait+0xd9](http://github.com/pingcap/tidb/pkg/session.(*txnFuture).wait+0xd9)					/workspace/source/tidb/pkg/session/txn.go:684
#	0x5b36a09	[github.com/pingcap/tidb/pkg/session.(*LazyTxn).changePendingToValid+0xe9](http://github.com/pingcap/tidb/pkg/session.(*LazyTxn).changePendingToValid+0xe9)			/workspace/source/tidb/pkg/session/txn.go:293
#	0x5b38ecf	[github.com/pingcap/tidb/pkg/session.(*LazyTxn).Wait+0x16f](http://github.com/pingcap/tidb/pkg/session.(*LazyTxn).Wait+0x16f)					/workspace/source/tidb/pkg/session/txn.go:609
#	0x5ae86a6	[github.com/pingcap/tidb/pkg/sessiontxn/isolation.(*baseTxnContextProvider).ActivateTxn+0xc6](http://github.com/pingcap/tidb/pkg/sessiontxn/isolation.(*baseTxnContextProvider).ActivateTxn+0xc6)	/workspace/source/tidb/pkg/sessiontxn/isolation/base.go:299
#	0x5ae7d26	[github.com/pingcap/tidb/pkg/sessiontxn/isolation.(*baseTxnContextProvider).OnInitialize+0x566](http://github.com/pingcap/tidb/pkg/sessiontxn/isolation.(*baseTxnContextProvider).OnInitialize+0x566)	/workspace/source/tidb/pkg/sessiontxn/isolation/base.go:146
#	0x5b3aabb	[github.com/pingcap/tidb/pkg/session.(*txnManager).EnterNewTxn+0x5b](http://github.com/pingcap/tidb/pkg/session.(*txnManager).EnterNewTxn+0x5b)				/workspace/source/tidb/pkg/session/txnmanager.go:161
#	0x5a024e5	[github.com/pingcap/tidb/pkg/executor.(*SimpleExec).executeBegin+0x1e5](http://github.com/pingcap/tidb/pkg/executor.(*SimpleExec).executeBegin+0x1e5)				/workspace/source/tidb/pkg/executor/simple.go:646
#	0x59fcf64	[github.com/pingcap/tidb/pkg/executor.(*SimpleExec).Next+0x524](http://github.com/pingcap/tidb/pkg/executor.(*SimpleExec).Next+0x524)					/workspace/source/tidb/pkg/executor/simple.go:161
#	0x4f075be	[github.com/pingcap/tidb/pkg/executor/internal/exec.Next+0x29e](http://github.com/pingcap/tidb/pkg/executor/internal/exec.Next+0x29e)					/workspace/source/tidb/pkg/executor/internal/exec/executor.go:460
#	0x5873ced	[github.com/pingcap/tidb/pkg/executor.(*ExecStmt).next+0x6d](http://github.com/pingcap/tidb/pkg/executor.(*ExecStmt).next+0x6d)					/workspace/source/tidb/pkg/executor/adapter.go:1269
#	0x58719d4	[github.com/pingcap/tidb/pkg/executor.(*ExecStmt).handleNoDelayExecutor+0x3b4](http://github.com/pingcap/tidb/pkg/executor.(*ExecStmt).handleNoDelayExecutor+0x3b4)			/workspace/source/tidb/pkg/executor/adapter.go:1018
#	0x5870378	[github.com/pingcap/tidb/pkg/executor.(*ExecStmt).handleNoDelay+0x238](http://github.com/pingcap/tidb/pkg/executor.(*ExecStmt).handleNoDelay+0x238)				/workspace/source/tidb/pkg/executor/adapter.go:851
#	0x586e477	[github.com/pingcap/tidb/pkg/executor.(*ExecStmt).Exec+0xed7](http://github.com/pingcap/tidb/pkg/executor.(*ExecStmt).Exec+0xed7)					/workspace/source/tidb/pkg/executor/adapter.go:614
....

What version of PD are you using (pd-server -V)?

./pd-server -V Release Version: v9.0.0-beta.1 Edition: Community Git Commit Hash: 110f73c7c28722c88539b6f7fc29248b3adf3010 Git Branch: HEAD UTC Build Time: 2025-03-17 10:29:29 2025-03-20T09:06:17.779+0800 INFO k8s/client.go:135 it should be noted that a long-running command will not be interrupted even the use case has ended. For more information, please refer to https://github.com/pingcap/test-infra/discussions/129 ./tidb-server -V Release Version: v9.0.0-beta.1 Edition: Community Git Commit Hash: dd701afad7b2781ea92265f4d5d68c3eb28bcfdb Git Branch: HEAD UTC Build Time: 2025-03-19 15:19:44 GoVersion: go1.23.7 Race Enabled: false Check Table Before Drop: false Store: unistore 2025-03-20T09:06:19.747+0800

Lily2025 avatar Mar 20 '25 10:03 Lily2025

/type bugrleungx

Lily2025 avatar Mar 20 '25 10:03 Lily2025

@Lily2025: The label(s) type/bugrleungx cannot be applied, because the repository doesn't have them.

In response to this:

/type bugrleungx

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot[bot] avatar Mar 20 '25 10:03 ti-chi-bot[bot]

/severity major /assign rleungx

Lily2025 avatar Mar 20 '25 10:03 Lily2025

/remove-severity major /severity critical

Lily2025 avatar Apr 18 '25 03:04 Lily2025

The trigger condition:

  1. The cluster enabled enable-follower-tso-proxy
  2. The PD server count is more than one.
  3. PD leader has been changed.

bufferflies avatar Apr 18 '25 03:04 bufferflies

The root cause: The connection context has been cancelled, but the stream context is different from the connection context, so the stream will not cancel the pending request.

Image

bufferflies avatar Apr 18 '25 03:04 bufferflies

The root cause: The connection context has been cancelled, but the stream context is different from the connection context, so the stream will not cancel the pending request.

@bufferflies In golang if you cancel parent context (connection ctx in this case) then all children contexts are cancelled automatically (stream context in this case cctx, cancel := context.WithCancel(ctx))

Tema avatar Apr 18 '25 16:04 Tema