pdms: support primary/transfer api for scheduling and tso
What problem does this PR solve?
For tiup
- When we have 3 pdms, pdms-0/pdms-1/pdms-2, and pdms-2 is primary
- upgrade pdms-2 firstly maybe transfer primary to pdms-0
- uprade pdms-0 will transfer primary again
We can upgrade pdms primary in last place(named defer feature) can avoid unnecessary primary transfer
Ref https://github.com/pingcap/tiup/pull/2414
For operator
tidb-operator does not have the ability to defer feature, it can only upgrade the pods in order.
Furthermore, Thinking about this situation:
- When we have 3 pdms, pdms-0/pdms-1/pdms-2, and pdms-2 is primary
- upgrade pdms-2 firstly maybe transfer primary to pdms-1
- upgrade pdms-1 maybe transfer primary to pdms-0.
To fix it, Assume that current primary ordinal is x, and range is [0, n]
- Find the max suitable ordinal in (x, n], because they have been upgraded
- If no suitable ordinal, find the min suitable ordinal in [0, x) to reduce the count of transfer
Ref https://github.com/pingcap/tidb-operator/pull/5643
Issue Number: Close #7995, Ref #5766, #7519
What is changed and how does it work?
- Add
primaryWatchfor the primary watch only, which is used to reuseWatchinterface inLeadership. primaryWatchwatches/ms/primary/transferAPI whether changed the primary.- modify the expected primary flag to the new primary
- modify memory status
- exit the primary watch loop
- delete the leader key
- Expected primary flag is the path to store the expected primary , ONLY Triggered BY
/ms/primary/transferAPI.- This flag likes a fence to avoid exited 2 primaries in the cluster simultaneously.
- Since follower will campaign a new primary when it found the
leader_keyis deleted. - We can ensure
expected_primaryis set before deleting theleader_key. - Old primary will set
expected_primaryfirstly,then delete theleader_keywhich will trigger the follower to campaign a new primary.
- support
/ms/primary/transferAPI to change primary
$ curl --location --request GET 'http://127.0.0.1:2379/pd/api/v2/ms/primary/tso'
"http://127.0.0.1:2382"%
$ curl --location --request POST 'http://127.0.0.1:2379/pd/api/v2/ms/primary/transfer/tso' \
--header 'Content-Type: text/plain' \
--data-raw '{
"new_primary": "tso-0"
}'
"success"%
$ curl --location --request GET 'http://127.0.0.1:2379/pd/api/v2/ms/primary/tso'
"http://127.0.0.1:2384"
$ curl --location --request POST 'http://127.0.0.1:2379/pd/api/v2/ms/primary/transfer/tso' \
--header 'Content-Type: application/json' \
--data-raw '{
"new_primary": ""
}'
"success"
$ curl --location --request GET 'http://127.0.0.1:2379/pd/api/v2/ms/primary/tso'
"http://127.0.0.1:2382"
the members info are
curl --location --request GET 'http://127.0.0.1:2379/pd/api/v2/ms/members/tso'
get
[
{
"name": "tso-0",
"service-addr": "http://127.0.0.1:2384",
"version": "v8.2.0-alpha-23-gdd72b9c19-dirty",
"git-hash": "dd72b9c19707ccbdb1801d379b3982a7944df23f",
"deploy-path": "/Users/pingcap/CS/PingCAP/pd/bin",
"start-timestamp": 1715577605,
"member-value": "ChtodHRwOi8vMTI3LjAuMC4xOjIzODQtMDAwMDAQp+L2iMCp3NUaGhVodHRwOi8vMTI3LjAuMC4xOjIzODQ="
},
{
"name": "tso-1",
"service-addr": "http://127.0.0.1:2386",
"version": "v8.2.0-alpha-23-gdd72b9c19-dirty",
"git-hash": "dd72b9c19707ccbdb1801d379b3982a7944df23f",
"deploy-path": "/Users/pingcap/CS/PingCAP/pd/bin",
"start-timestamp": 1715577605,
"member-value": "ChtodHRwOi8vMTI3LjAuMC4xOjIzODYtMDAwMDAQj9ro5Yq9mY8mGhVodHRwOi8vMTI3LjAuMC4xOjIzODY="
}
]
Check List
Tests
- Unit test
- Manual test (add detailed scripts or steps below)
Release note
None.
[REVIEW NOTIFICATION]
This pull request has not been approved.
To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.
The full list of commands accepted by this bot can be found here.
Reviewer can indicate their review by submitting an approval review. Reviewer can cancel approval by submitting a request changes review.
Codecov Report
Attention: Patch coverage is 75.70093% with 52 lines in your changes missing coverage. Please review.
Project coverage is 77.49%. Comparing base (
3f32f54) to head (2d9a3b0). Report is 2 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #8157 +/- ##
==========================================
+ Coverage 77.40% 77.49% +0.08%
==========================================
Files 472 473 +1
Lines 61821 61934 +113
==========================================
+ Hits 47854 47993 +139
+ Misses 10400 10373 -27
- Partials 3567 3568 +1
| Flag | Coverage Δ | |
|---|---|---|
| unittests | 77.49% <75.70%> (+0.08%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
@JmPotato @lhy1024 PTAL, thx!
@rleungx @JmPotato @lhy1024 friendly ping :)
[LGTM Timeline notifier]
Timeline:
/hold Need to prepare tiup&operator pr
@JmPotato @rleungx Can you put in a lgtm to indicate that you have agreed? Then I'll cancel the hold label. :) Thx!
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: JmPotato, rleungx
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [JmPotato,rleungx]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
Completed manual testing of operator and tiup
/unhold
@HuSharp: Your PR was out of date, I have automatically updated it for you.
If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.