etcd
etcd copied to clipboard
Added member interaction into EtcdProcessCluster
As I promised in https://github.com/etcd-io/etcd/pull/14354, trying to introduce to EtcdProcessCluster functions for changing cluster member list.
As an example rewrote e2e tests from mentioned pull request.
Please resolve the conflict.
@ahrtr done, rebased on main
I was asked by @ahrtr to do it. In general I'm completely okay with the way my test is currently written. Honestly, idk how to simplify test without changing in common libraries. Maybe it's better to not to change anything at all (then feel free to reject PR, I'm okay with it).
For me problem is making list of members dynamic, which would be only used for this test. Could we maybe check if the test itself can be simpler? For testing endpoints-auto-sync-interval
do we really need to start a whole etcd process?
My understanding is that we are just looking for a Resolver state updated
log in proxy. I think that just calling a MemberAdd
on cluster should be enough.
Hmm, can't agree here. Even using logs for making progress in tests make me uncomfortable, honestly. But I definitely would not say that something is working based on logs. I prefer to think that "it doesn't work until you run it explicitly and get expected result".
I do not get time to have a deeper review on this PR, but my immediate feeling is it's a little over complicated.
The gap mentioned in https://github.com/etcd-io/etcd/pull/14354#discussion_r953155400 is that the current test framework has interface MemberAdd, but doesn't have interface method something like MemberLaunch
or LaunchMember
or StartMember
or MemberStart
. I think we do need to support dynamically changing the member in the framework.
Please always keep OCP (Open Close principle) in mind: Open for extension, but close for modification. When updating existing framework, please try not to change the existing logic. Instead, please try to make your new code adapt to the existing framework.
I'm afraid we're stuck with this PR (because we're in a kinda closed loop).
Changed e2e test cannot be rewritten with common or integration framework, because we're testing interaction through different running instances (including updates via network).
The same story is here: https://github.com/etcd-io/etcd/blob/main/tests/e2e/ctl_v3_snapshot_test.go#L258
Both these tests need actual member list changes. Both tests do it via raw commands. I tried to add these commands to e2e framework, but you think it's unnecessary. You're contributors, and if you're saying that it's easer to use raw commands for couple of tests - let it be.
When I tried to enhance the existing e2e test framework, eventually it's similar to this PR. But the interface definition in my commit looks clearer, I also get rid of the nextSeq
. Please consider to enhance this PR per https://github.com/ahrtr/etcd/commit/9649fd7538cecaaabdef88aa159129629262da96
Codecov Report
Merging #14404 (9649fd7) into main (4d57eb8) will decrease coverage by
0.72%
. The diff coverage is48.48%
.
:exclamation: Current head 9649fd7 differs from pull request most recent head 76ab474. Consider uploading reports for the commit 76ab474 to get more accurate results
@@ Coverage Diff @@
## main #14404 +/- ##
==========================================
- Coverage 75.47% 74.75% -0.73%
==========================================
Files 457 457
Lines 37183 37207 +24
==========================================
- Hits 28065 27813 -252
- Misses 7361 7612 +251
- Partials 1757 1782 +25
Flag | Coverage Δ | |
---|---|---|
all | 74.75% <48.48%> (-0.73%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
client/v3/mock/mockserver/mockserver.go | 0.00% <0.00%> (ø) |
|
server/etcdserver/raft.go | 88.29% <71.42%> (-1.48%) |
:arrow_down: |
etcdctl/ctlv3/command/move_leader_command.go | 73.68% <100.00%> (+2.63%) |
:arrow_up: |
server/etcdserver/server.go | 84.63% <100.00%> (-0.93%) |
:arrow_down: |
server/proxy/grpcproxy/election.go | 9.09% <0.00%> (-72.73%) |
:arrow_down: |
server/proxy/grpcproxy/lock.go | 33.33% <0.00%> (-66.67%) |
:arrow_down: |
...ver/proxy/grpcproxy/adapter/lock_client_adapter.go | 33.33% <0.00%> (-66.67%) |
:arrow_down: |
...proxy/grpcproxy/adapter/election_client_adapter.go | 6.89% <0.00%> (-62.07%) |
:arrow_down: |
etcdutl/etcdutl/migrate_command.go | 21.05% <0.00%> (-53.95%) |
:arrow_down: |
...xy/grpcproxy/adapter/maintenance_client_adapter.go | 17.14% <0.00%> (-17.15%) |
:arrow_down: |
... and 42 more |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
Maybe I didn't get the idea, but code looks broken for me:
-
Cluster state is omitted It's necessary to pass "initial-cluster-state existing" when new member is added (I think that was the reason to add
SetInitialCluster
to EtcdServerProcessConfig). -
nextSeq With "epc.Cfg.ClusterSize", you'll try to reuse occupied ports in case of this sequence:
- Start cluster of >1 nodes
- Remove member (with least ports number)
- Add member (it will try to use the same ports as the member with highest ports number)
The commit is just to demonstrate the high level idea instead of a PR ready for review.
Then I didn't get the idea, sorry.
You got rid of things that was introduced in order to make code workable.
2. nextSeq With "epc.Cfg.ClusterSize", you'll try to reuse occupied ports in case of this sequence:
The initial value for nextSq
is cfg.ClusterSize
, and incremental by 1 each time calling StartNewProc
. I don't understand why it can't be removed and use cfg.ClusterSize
or len(epc.Procs)
directly. You just need to update cfg.ClusterSize
and epc.Procs
each time after calling StartNewProc
.
Let's assume that initial configuration is:
- base port 2000
- 2 nodes
Then node A has port 2000 + 15 = 2005, node B has port 2000 + 25 = 2010.
Now imaging that I'm first removing node A and then adding new node C. What will I get?
If I use cfg.ClusterSize or len(epc.Procs), I'll get port for new node = 2000 + 2*5 = 2010, which is overlapping with already running node B.
That's why I need some counter that doesn't depend on the current state of cluster. That's why nextSq
was introduced.
Let's assume that initial configuration is:
* base port 2000 * 2 nodes
Then node A has port 2000 + 1_5 = 2005, node B has port 2000 + 2_5 = 2010.
Now imaging that I'm first removing node A and then adding new node C. What will I get?
If I use cfg.ClusterSize or len(epc.Procs), I'll get port for new node = 2000 + 2*5 = 2010, which is overlapping with already running node B. That's why I need some counter that doesn't depend on the current state of cluster. That's why
nextSq
was introduced.
Make sense to me, thanks for the clarification.
@biosvs Could you resolve the inline comments? Please also rebase this PR.
Continue to work on this PR in https://github.com/etcd-io/etcd/pull/14589