cluster: support IngoreInitConfigComps
What problem does this PR solve?
If we have a large number of tikv-server, it will cost a lot of time(several hours) to generate config
What is changed and how it works?
./tiup-cluster scale-in Kvstore_UAT_0 --node <IP:port> -y --ignore-config-roles tikv
Check List
Tests
- Unit test
- Integration test
- Manual test (add detailed scripts or steps below)
- No code
Code changes
- Has exported function/method change
- Has exported variable/fields change
- Has interface methods change
- Has persistent data change
Side effects
- Possible performance regression
- Increased code complexity
- Breaking backward compatibility
Related changes
- Need to cherry-pick to the release branch
- Need to update the documentation
Release notes:
NONE
[REVIEW NOTIFICATION]
This pull request has not been approved.
To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.
The full list of commands accepted by this bot can be found here.
Reviewer can indicate their review by submitting an approval review. Reviewer can cancel approval by submitting a request changes review.
Codecov Report
Base: 56.31% // Head: 50.94% // Decreases project coverage by -5.37% :warning:
Coverage data is based on head (
ad17bd3) compared to base (9e2e464). Patch coverage: 100.00% of modified lines in pull request are covered.
:exclamation: Current head ad17bd3 differs from pull request most recent head 286e65e. Consider uploading reports for the commit 286e65e to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## master #1987 +/- ##
==========================================
- Coverage 56.31% 50.94% -5.37%
==========================================
Files 313 312 -1
Lines 33492 33481 -11
==========================================
- Hits 18858 17055 -1803
- Misses 12415 14212 +1797
+ Partials 2219 2214 -5
| Flag | Coverage Δ | |
|---|---|---|
| tiup | 16.17% <ø> (ø) |
|
| unittest | ? |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Impacted Files | Coverage Δ | |
|---|---|---|
| pkg/cluster/operation/operation.go | 80.65% <ø> (ø) |
|
| components/cluster/command/prune.go | 59.09% <100.00%> (ø) |
|
| components/cluster/command/scale_in.go | 75.00% <100.00%> (ø) |
|
| components/cluster/command/scale_out.go | 74.29% <100.00%> (ø) |
|
| pkg/cluster/manager/builder.go | 67.20% <100.00%> (ø) |
|
| components/dm/ansible/worker.go | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
| pkg/meta/err.go | 0.00% <0.00%> (-76.19%) |
:arrow_down: |
| pkg/cluster/api/error.go | 0.00% <0.00%> (-75.00%) |
:arrow_down: |
| pkg/crypto/rand/passwd.go | 0.00% <0.00%> (-75.00%) |
:arrow_down: |
| pkg/telemetry/node_info.go | 0.00% <0.00%> (-70.73%) |
:arrow_down: |
| ... and 53 more |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
- Download blackbox_exporter: (linux/amd64) ... Done
- Download node_exporter: (linux/amd64) ... Done
- Download alertmanager: (linux/amd64) ... Error
Error: read manifest from mirror(https://tiup-mirrors.pingcap.com/) failed: invalid signature for file root.json: not enough signatures (2) for threshold 3 in root.json
I can't fix this unit test, can someone help me?
I think we should resolve that it takes a lot of time(several hours) to generate config, rather than provide a flag to ignore it
I think we should resolve that it takes a lot of time(several hours) to generate config, rather than provide a flag to ignore it
I don't know why we should generate configs in all nodes when scale in. The config seems unchanged. Could you please explain it to me?
PTAL
If any PD node is scaled in, we should re-generate configs for all TiKV and TiDB nodes as they are in the startup scripts. And the Prometheus config is always updated if any node is added or removed from the cluster.
I agree that we don't have to regenerate configs for all nodes in some cases, but that could be quite complex to implement, the current approach is a reasonable workaround.
Could you rename the --ignore-components argument to something like --ignore-config-roles to show that it is for configs? And I think it could be better to mark it as hidden as well.
Thanks for your explanation @AstroProfundis I think maybe we can disable regenerate configs when TiDB/TiKV scale in/out or prune? I think this operation is safe.
Sorry for the delay...
I think maybe we can disable regenerate configs when TiDB/TiKV scale in/out or prune?
I agree, and I think TiFlash is also safe to be ignored, but I'm not 100% sure about that...
In our production environment(only TiKV cluster), I have used this code many times and found nothing unusual, how about adding this feature as optional? Our cluster has hundreds of nodes, and init config for every node when scaling is really slow.
I agree that adding it as an optional switch for users to decide what components should be ignored when updating configs could be reasonable.
Could you rename the
--ignore-componentsargument to something like--ignore-config-rolesto show that it is for configs? And I think it could be better to mark it as hidden as well.
How about like this?
@AstroProfundis It's been a long time, Do you still interested in this PR?