pd icon indicating copy to clipboard operation
pd copied to clipboard

pdms: support primary/transfer api for scheduling and tso

Open HuSharp opened this issue 1 year ago • 9 comments

What problem does this PR solve?

For tiup

  1. When we have 3 pdms, pdms-0/pdms-1/pdms-2, and pdms-2 is primary
  2. upgrade pdms-2 firstly maybe transfer primary to pdms-0
  3. uprade pdms-0 will transfer primary again

We can upgrade pdms primary in last place(named defer feature) can avoid unnecessary primary transfer

Ref https://github.com/pingcap/tiup/pull/2414

For operator

tidb-operator does not have the ability to defer feature, it can only upgrade the pods in order.

Furthermore, Thinking about this situation:

  1. When we have 3 pdms, pdms-0/pdms-1/pdms-2, and pdms-2 is primary
  2. upgrade pdms-2 firstly maybe transfer primary to pdms-1
  3. upgrade pdms-1 maybe transfer primary to pdms-0.

To fix it, Assume that current primary ordinal is x, and range is [0, n]

  1. Find the max suitable ordinal in (x, n], because they have been upgraded
  2. If no suitable ordinal, find the min suitable ordinal in [0, x) to reduce the count of transfer

Ref https://github.com/pingcap/tidb-operator/pull/5643

Issue Number: Close #7995, Ref #5766, #7519

What is changed and how does it work?

  1. Add primaryWatch for the primary watch only, which is used to reuse Watch interface in Leadership.
  2. primaryWatch watches /ms/primary/transfer API whether changed the primary.
    1. modify the expected primary flag to the new primary
    2. modify memory status
    3. exit the primary watch loop
    4. delete the leader key
  3. Expected primary flag is the path to store the expected primary , ONLY Triggered BY /ms/primary/transfer API.
    • This flag likes a fence to avoid exited 2 primaries in the cluster simultaneously.
    • Since follower will campaign a new primary when it found the leader_key is deleted.
    • We can ensure expected_primary is set before deleting the leader_key.
    • Old primary will set expected_primary firstly,then delete the leader_key which will trigger the follower to campaign a new primary.
  4. support /ms/primary/transfer API to change primary
$ curl --location --request GET 'http://127.0.0.1:2379/pd/api/v2/ms/primary/tso'
"http://127.0.0.1:2382"%

$ curl --location --request POST 'http://127.0.0.1:2379/pd/api/v2/ms/primary/transfer/tso' \
--header 'Content-Type: text/plain' \
--data-raw '{
    "new_primary": "tso-0"
}'
"success"%

$ curl --location --request GET 'http://127.0.0.1:2379/pd/api/v2/ms/primary/tso'
"http://127.0.0.1:2384"

$ curl --location --request POST 'http://127.0.0.1:2379/pd/api/v2/ms/primary/transfer/tso' \
--header 'Content-Type: application/json' \
--data-raw '{
    "new_primary": ""
}'
"success"

$ curl --location --request GET 'http://127.0.0.1:2379/pd/api/v2/ms/primary/tso'
"http://127.0.0.1:2382"

the members info are

curl --location --request GET 'http://127.0.0.1:2379/pd/api/v2/ms/members/tso'

get

[
    {
        "name": "tso-0",
        "service-addr": "http://127.0.0.1:2384",
        "version": "v8.2.0-alpha-23-gdd72b9c19-dirty",
        "git-hash": "dd72b9c19707ccbdb1801d379b3982a7944df23f",
        "deploy-path": "/Users/pingcap/CS/PingCAP/pd/bin",
        "start-timestamp": 1715577605,
        "member-value": "ChtodHRwOi8vMTI3LjAuMC4xOjIzODQtMDAwMDAQp+L2iMCp3NUaGhVodHRwOi8vMTI3LjAuMC4xOjIzODQ="
    },
    {
        "name": "tso-1",
        "service-addr": "http://127.0.0.1:2386",
        "version": "v8.2.0-alpha-23-gdd72b9c19-dirty",
        "git-hash": "dd72b9c19707ccbdb1801d379b3982a7944df23f",
        "deploy-path": "/Users/pingcap/CS/PingCAP/pd/bin",
        "start-timestamp": 1715577605,
        "member-value": "ChtodHRwOi8vMTI3LjAuMC4xOjIzODYtMDAwMDAQj9ro5Yq9mY8mGhVodHRwOi8vMTI3LjAuMC4xOjIzODY="
    }
]

Check List

Tests

  • Unit test
  • Manual test (add detailed scripts or steps below)

Release note

None.

HuSharp avatar May 09 '24 03:05 HuSharp

[REVIEW NOTIFICATION]

This pull request has not been approved.

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment. After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review. Reviewer can cancel approval by submitting a request changes review.

ti-chi-bot[bot] avatar May 09 '24 03:05 ti-chi-bot[bot]

Codecov Report

Attention: Patch coverage is 75.70093% with 52 lines in your changes missing coverage. Please review.

Project coverage is 77.49%. Comparing base (3f32f54) to head (2d9a3b0). Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8157      +/-   ##
==========================================
+ Coverage   77.40%   77.49%   +0.08%     
==========================================
  Files         472      473       +1     
  Lines       61821    61934     +113     
==========================================
+ Hits        47854    47993     +139     
+ Misses      10400    10373      -27     
- Partials     3567     3568       +1     
Flag Coverage Δ
unittests 77.49% <75.70%> (+0.08%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

codecov[bot] avatar May 09 '24 13:05 codecov[bot]

@JmPotato @lhy1024 PTAL, thx!

HuSharp avatar Jun 13 '24 06:06 HuSharp

@rleungx @JmPotato @lhy1024 friendly ping :)

HuSharp avatar Jun 27 '24 06:06 HuSharp

[LGTM Timeline notifier]

Timeline:

  • 2024-07-04 08:12:07.112911669 +0000 UTC m=+1484853.598400502: :ballot_box_with_check: agreed by JmPotato.
  • 2024-07-08 08:11:01.07454411 +0000 UTC m=+258758.309778223: :ballot_box_with_check: agreed by rleungx.

ti-chi-bot[bot] avatar Jul 08 '24 08:07 ti-chi-bot[bot]

/hold Need to prepare tiup&operator pr

HuSharp avatar Jul 11 '24 06:07 HuSharp

@JmPotato @rleungx Can you put in a lgtm to indicate that you have agreed? Then I'll cancel the hold label. :) Thx!

HuSharp avatar Aug 12 '24 09:08 HuSharp

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JmPotato, rleungx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • ~~OWNERS~~ [JmPotato,rleungx]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot[bot] avatar Aug 13 '24 02:08 ti-chi-bot[bot]

Completed manual testing of operator and tiup

HuSharp avatar Aug 13 '24 04:08 HuSharp

/unhold

HuSharp avatar Aug 13 '24 04:08 HuSharp

@HuSharp: Your PR was out of date, I have automatically updated it for you.

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot[bot] avatar Aug 13 '24 04:08 ti-chi-bot[bot]