tiup icon indicating copy to clipboard operation
tiup copied to clipboard

cluster: support IngoreInitConfigComps

Open Smityz opened this issue 3 years ago • 13 comments

What problem does this PR solve?

If we have a large number of tikv-server, it will cost a lot of time(several hours) to generate config

What is changed and how it works?

./tiup-cluster scale-in Kvstore_UAT_0 --node <IP:port> -y --ignore-config-roles tikv

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Code changes

  • Has exported function/method change
  • Has exported variable/fields change
  • Has interface methods change
  • Has persistent data change

Side effects

  • Possible performance regression
  • Increased code complexity
  • Breaking backward compatibility

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation

Release notes:

NONE

Smityz avatar Jul 14 '22 11:07 Smityz

[REVIEW NOTIFICATION]

This pull request has not been approved.

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment. After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review. Reviewer can cancel approval by submitting a request changes review.

ti-chi-bot avatar Jul 14 '22 11:07 ti-chi-bot

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Jul 14 '22 11:07 CLAassistant

Codecov Report

Base: 56.31% // Head: 50.94% // Decreases project coverage by -5.37% :warning:

Coverage data is based on head (ad17bd3) compared to base (9e2e464). Patch coverage: 100.00% of modified lines in pull request are covered.

:exclamation: Current head ad17bd3 differs from pull request most recent head 286e65e. Consider uploading reports for the commit 286e65e to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1987      +/-   ##
==========================================
- Coverage   56.31%   50.94%   -5.37%     
==========================================
  Files         313      312       -1     
  Lines       33492    33481      -11     
==========================================
- Hits        18858    17055    -1803     
- Misses      12415    14212    +1797     
+ Partials     2219     2214       -5     
Flag Coverage Δ
tiup 16.17% <ø> (ø)
unittest ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/cluster/operation/operation.go 80.65% <ø> (ø)
components/cluster/command/prune.go 59.09% <100.00%> (ø)
components/cluster/command/scale_in.go 75.00% <100.00%> (ø)
components/cluster/command/scale_out.go 74.29% <100.00%> (ø)
pkg/cluster/manager/builder.go 67.20% <100.00%> (ø)
components/dm/ansible/worker.go 0.00% <0.00%> (-100.00%) :arrow_down:
pkg/meta/err.go 0.00% <0.00%> (-76.19%) :arrow_down:
pkg/cluster/api/error.go 0.00% <0.00%> (-75.00%) :arrow_down:
pkg/crypto/rand/passwd.go 0.00% <0.00%> (-75.00%) :arrow_down:
pkg/telemetry/node_info.go 0.00% <0.00%> (-70.73%) :arrow_down:
... and 53 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov-commenter avatar Jul 14 '22 11:07 codecov-commenter

  - Download blackbox_exporter: (linux/amd64) ... Done
  - Download node_exporter: (linux/amd64) ... Done
  - Download alertmanager: (linux/amd64) ... Error
Error: read manifest from mirror(https://tiup-mirrors.pingcap.com/) failed: invalid signature for file root.json: not enough signatures (2) for threshold 3 in root.json

I can't fix this unit test, can someone help me?

Smityz avatar Jul 15 '22 06:07 Smityz

I think we should resolve that it takes a lot of time(several hours) to generate config, rather than provide a flag to ignore it

nexustar avatar Jul 15 '22 10:07 nexustar

I think we should resolve that it takes a lot of time(several hours) to generate config, rather than provide a flag to ignore it

I don't know why we should generate configs in all nodes when scale in. The config seems unchanged. Could you please explain it to me?

Smityz avatar Jul 15 '22 12:07 Smityz

PTAL

Smityz avatar Aug 03 '22 10:08 Smityz

If any PD node is scaled in, we should re-generate configs for all TiKV and TiDB nodes as they are in the startup scripts. And the Prometheus config is always updated if any node is added or removed from the cluster.

I agree that we don't have to regenerate configs for all nodes in some cases, but that could be quite complex to implement, the current approach is a reasonable workaround.

Could you rename the --ignore-components argument to something like --ignore-config-roles to show that it is for configs? And I think it could be better to mark it as hidden as well.

AstroProfundis avatar Aug 30 '22 03:08 AstroProfundis

Thanks for your explanation @AstroProfundis I think maybe we can disable regenerate configs when TiDB/TiKV scale in/out or prune? I think this operation is safe.

Smityz avatar Sep 01 '22 07:09 Smityz

Sorry for the delay...

I think maybe we can disable regenerate configs when TiDB/TiKV scale in/out or prune?

I agree, and I think TiFlash is also safe to be ignored, but I'm not 100% sure about that...

AstroProfundis avatar Oct 12 '22 08:10 AstroProfundis

In our production environment(only TiKV cluster), I have used this code many times and found nothing unusual, how about adding this feature as optional? Our cluster has hundreds of nodes, and init config for every node when scaling is really slow.

Smityz avatar Oct 15 '22 10:10 Smityz

I agree that adding it as an optional switch for users to decide what components should be ignored when updating configs could be reasonable.

Could you rename the --ignore-components argument to something like --ignore-config-roles to show that it is for configs? And I think it could be better to mark it as hidden as well.

How about like this?

AstroProfundis avatar Oct 17 '22 06:10 AstroProfundis

@AstroProfundis It's been a long time, Do you still interested in this PR?

Smityz avatar Nov 24 '22 09:11 Smityz