etcd
etcd copied to clipboard
Duplicate names in `ETCD_INITIAL_CLUSTER` not handled correctly
What happened?
If you don't pass a --name argument to your etcd processes, they will all have the name default and the cluster will operate normally. However, when you add a member, the generated ETCD_INITIAL_CLUSTER variable will have multiple entries with the name "default". When this environment variable is used, etcd will parse these into a mapping under a single key ("default") with multiple URLs, and create a single member. See https://github.com/etcd-io/etcd/blob/63a1cc3fe40bace6898289dec35a9aad05163889/server/etcdserver/api/membership/cluster.go#L83-L86
This leads to the confusing error message "member count is unequal". The documentation on https://etcd.io/docs/v3.5/op-guide/runtime-configuration/ mentions this failure, but the situation is different.
What did you expect to happen?
Either
a. member add should fail, saying it cannot generate a valid ETCD_INITIAL_CLUSTER due to duplicate names, or
b. etcd should accept duplicate names in ETCD_INITIAL_CLUSTER and treat them as separate members. This can be accomplished by updating func NewClusterFromURLsMap as follows:
c := NewCluster(lg, opts...)
for name, urls := range urlsmap {
for idx, _ := range urls {
m := NewMember(name, urls[idx:idx+1], token, nil)
[...]
I don't know if there's a real need to be able to specify multiple URLs for a single member.
How can we reproduce it (as minimally and precisely as possible)?
You need three terminals, x, y, and z:
x$ mkdir -p test_case/{a,b,c}/{data/member,wal}
x$ ETCD_INITIAL_CLUSTER="a=http://127.0.0.1:40000,b=http://127.0.0.1:40001" ETCD_INITIAL_CLUSTER_STATE=new etcd --name a --{initial-advertise,listen}-peer-urls=http://127.0.0.1:40000 --{advertise,listen}-client-urls=http://127.0.0.1:50000 --data-dir test_case/a/data --wal-dir test_case/a/wal
y$ ETCD_INITIAL_CLUSTER="a=http://127.0.0.1:40000,b=http://127.0.0.1:40001" ETCD_INITIAL_CLUSTER_STATE=new etcd --name b --{initial-advertise,listen}-peer-urls=http://127.0.0.1:40001 --{advertise,listen}-client-urls=http://127.0.0.1:50001 --data-dir test_case/b/data --wal-dir test_case/b/wal
[now kill both servers with Ctrl-C]
x$ etcd --listen-peer-urls=http://127.0.0.1:40000 --{advertise,listen}-client-urls=http://127.0.0.1:50000 --data-dir test_case/a/data --wal-dir test_case/a/wal
y$ etcd --listen-peer-urls=http://127.0.0.1:40001 --{advertise,listen}-client-urls=http://127.0.0.1:50001 --data-dir test_case/b/data --wal-dir test_case/b/wal
z$ ETCDCTL_ENDPOINT=http://localhost:50000 etcdctl member add c http://127.0.0.1:40002
Added member named c with ID 7b4d6e3edb76bc59 to cluster
ETCD_NAME="c"
ETCD_INITIAL_CLUSTER="default=http://127.0.0.1:40000,c=http://127.0.0.1:40002,default=http://127.0.0.1:40001"
ETCD_INITIAL_CLUSTER_STATE="existing"
z$ export ETCD_NAME="c"
z$ export ETCD_INITIAL_CLUSTER="default=http://127.0.0.1:40000,c=http://127.0.0.1:40002,default=http://127.0.0.1:40001"
z$ export ETCD_INITIAL_CLUSTER_STATE="existing"
z$ etcd --listen-peer-urls=http://127.0.0.1:40002 --{advertise,listen}-client-urls=http://127.0.0.1:50002 --data-dir test_case/c/data --wal-dir test_case/c/wal
[...]
member count is unequal
Anything else we need to know?
No response
Etcd version (please run commands below)
$ etcd --version
# paste output here
$ etcdctl version
# paste output here
Etcd configuration (command line flags or environment variables)
paste your configuration here
Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)
etcd 3.5.2
Relevant log output
No response
A member can have multiple client or peer URLs. So in this case, you must specify the flag --name. But I agree that we should add a warning if the flag --name isn't present. Feel free to submit a PR for this. Thanks.
@ahrtr Would it be okay if I work on this?
@Divya063 Definitely yes. Thank you!
Hey @mortehu
I was trying to reproduce the issue by your given commands.
First of all, I think that this command on z terminal is wrong -> z$ ETCDCTL_ENDPOINT=http://localhost:50000 etcdctl member add c http://127.0.0.1:40002. As it gave me error: Error: too many arguments, did you mean --peer-urls=http://127.0.0.1:40002
After that I ran this command: ETCDCTL_ENDPOINT=http://localhost:50000 etcdctl member add c --peer-urls=http://127.0.0.1:40002 and the output was as follows.
{"level":"warn","ts":"2022-04-12T00:20:06.341-0700","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCDCTL_ENDPOINT=http://localhost:50000"}
Member f6f1fd0cdb6d6ac0 added to cluster cdf818194e3a8c32
ETCD_NAME="c"
ETCD_INITIAL_CLUSTER="default=http://localhost:2380,c=http://127.0.0.1:40002"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://127.0.0.1:40002"
ETCD_INITIAL_CLUSTER_STATE="existing"
After adding the member, I exported the required variables and executed etcd listen command: etcd --listen-peer-urls=http://127.0.0.1:40002 --{advertise,listen}-client-urls=http://127.0.0.1:50002 --data-dir test_case/c/data --wal-dir test_case/c/wal, but I didn't got any error of member count is unequal.
Instead, the error was:
{"level":"info","ts":"2022-04-12T00:21:32.887-0700","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_INITIAL_CLUSTER","variable-value":"default=http://127.0.0.1:40000,c=http://127.0.0.1:40002,default=http://127.0.0.1:40001"}
{"level":"info","ts":"2022-04-12T00:21:32.887-0700","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_INITIAL_CLUSTER_STATE","variable-value":"existing"}
{"level":"info","ts":"2022-04-12T00:21:32.887-0700","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_NAME","variable-value":"c"}
{"level":"info","ts":"2022-04-12T00:21:32.887-0700","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["etcd","--listen-peer-urls=http://127.0.0.1:40002","--advertise-client-urls=http://127.0.0.1:50002","--listen-client-urls=http://127.0.0.1:50002","--data-dir","test_case/c/data","--wal-dir","test_case/c/wal"]}
{"level":"info","ts":"2022-04-12T00:21:32.887-0700","caller":"etcdmain/etcd.go:116","msg":"server has already been initialized","data-dir":"test_case/c/data","dir-type":"member"}
{"level":"info","ts":"2022-04-12T00:21:32.887-0700","caller":"embed/etcd.go:121","msg":"configuring peer listeners","listen-peer-urls":["http://127.0.0.1:40002"]}
{"level":"info","ts":"2022-04-12T00:21:32.888-0700","caller":"embed/etcd.go:129","msg":"configuring client listeners","listen-client-urls":["http://127.0.0.1:50002"]}
{"level":"info","ts":"2022-04-12T00:21:32.888-0700","caller":"embed/etcd.go:307","msg":"starting an etcd server","etcd-version":"3.6.0-alpha.0","git-sha":"7d3ca1f51","go-version":"go1.18","go-os":"linux","go-arch":"amd64","max-cpu-set":12,"max-cpu-available":12,"member-initialized":false,"name":"c","data-dir":"test_case/c/data","wal-dir":"test_case/c/wal","wal-dir-dedicated":"test_case/c/wal","member-dir":"test_case/c/data/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","wait-cluster-ready-timeout":"5s","initial-election-tick-advance":true,"snapshot-count":100000,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://localhost:2380"],"listen-peer-urls":["http://127.0.0.1:40002"],"advertise-client-urls":["http://127.0.0.1:50002"],"listen-client-urls":["http://127.0.0.1:50002"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"c=http://127.0.0.1:40002,default=http://127.0.0.1:40000,default=http://127.0.0.1:40001","initial-cluster-state":"existing","initial-cluster-token":"etcd-cluster","quota-size-bytes":2147483648,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","discovery-token":"","discovery-endpoints":"","discovery-dial-timeout":"2s","discovery-request-timeout":"5s","discovery-keepalive-time":"2s","discovery-keepalive-timeout":"6s","discovery-insecure-transport":true,"discovery-insecure-skip-tls-verify":false,"discovery-cert":"","discovery-key":"","discovery-cacert":"","discovery-user":"","downgrade-check-interval":"5s","max-learners":1}
{"level":"warn","ts":"2022-04-12T00:21:32.888-0700","caller":"fileutil/fileutil.go:53","msg":"check file permission","error":"directory \"test_case/c/data\" exist, but the permission is \"drwxrwxr-x\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}
{"level":"warn","ts":"2022-04-12T00:21:32.888-0700","caller":"fileutil/fileutil.go:53","msg":"check file permission","error":"directory \"test_case/c/data/member\" exist, but the permission is \"drwxrwxr-x\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}
{"level":"info","ts":"2022-04-12T00:21:32.888-0700","caller":"storage/backend.go:81","msg":"opened backend db","path":"test_case/c/data/member/snap/db","took":"82.44µs"}
{"level":"warn","ts":"2022-04-12T00:21:32.888-0700","caller":"schema/schema.go:43","msg":"Failed to detect storage schema version. Please wait till wal snapshot before upgrading cluster."}
{"level":"info","ts":"2022-04-12T00:21:33.006-0700","caller":"embed/etcd.go:383","msg":"closing etcd server","name":"c","data-dir":"test_case/c/data","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://127.0.0.1:50002"]}
{"level":"info","ts":"2022-04-12T00:21:33.006-0700","caller":"embed/etcd.go:385","msg":"closed etcd server","name":"c","data-dir":"test_case/c/data","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://127.0.0.1:50002"]}
{"level":"fatal","ts":"2022-04-12T00:21:33.006-0700","caller":"etcdmain/etcd.go:204","msg":"discovery failed","error":"error validating peerURLs {ClusterID:aab0e09a079f9f55 Members:[&{ID:33cf8d3d56df1746 RaftAttributes:{PeerURLs:[http://127.0.0.1:40000] IsLearner:false} Attributes:{Name:default ClientURLs:[http://127.0.0.1:50000]}} &{ID:8d0cef3f13600fd7 RaftAttributes:{PeerURLs:[http://127.0.0.1:40001] IsLearner:false} Attributes:{Name:default ClientURLs:[http://127.0.0.1:50001]}}] RemovedMemberIDs:[]}: PeerURLs: no match found for existing member (33cf8d3d56df1746, [http://127.0.0.1:40000]), last resolver error (len([\"http://127.0.0.1:40000\"]) != len([\"http://127.0.0.1:40000\" \"http://127.0.0.1:40001\"]))","stacktrace":"go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\t/home/nisarg1499/opensource/etcd/server/etcdmain/etcd.go:204\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\t/home/nisarg1499/opensource/etcd/server/etcdmain/main.go:40\nmain.main\n\t/home/nisarg1499/opensource/etcd/server/main.go:32\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
Can you please tell me where I went wrong in reproducing the error? I followed the same given commands for terminal x and y.
hey, I'm looking for a beginner-friendly issue if this one is available
Thanks @keremgocen , let's double confirm with @Divya063 firstly to avoid doing duplicate work.
@Divya063 are you still working on this?
@keremgocen Do let me know if you are bale to replicate the issue? I am also looking to work on some beginner-friendly issues. @ahrtr
Can you please tell me where I went wrong in reproducing the error? I followed the same given commands for terminal x and y.
Two comments:
- The environment variable should be
ETCDCTL_ENDPOINTSinstead ofETCDCTL_ENDPOINT; - You need to start a cluster with multiple members, i.e. 3
Can you please tell me where I went wrong in reproducing the error? I followed the same given commands for terminal x and y.
Two comments:
- The environment variable should be
ETCDCTL_ENDPOINTSinstead ofETCDCTL_ENDPOINT;- You need to start a cluster with multiple members, i.e. 3
Thanks a lot for your reply. I'll check it.
Looks like no progress on this issue.
I would like to work on it.
I read the relevant code and found that Config.Name only has an actual role when the member is started for the first time -- used to determine whether it is local or remote: https://github.com/etcd-io/etcd/blob/main/server/etcdserver/cluster_util.go#L129
At other times, it is just an identifier without any constraints. Even the same member can be started with a different name each time
So I am more inclined to accept duplicate names in ETCD_INITIAL_CLUSTER and treat them as separate members.
What's your opinion? Thanks! @serathius @ahrtr
@nic-chen are you working on this? @ahrtr I was able to reproduce the issue. If @nic-chen is not working on this can I take it up? Also which of the two approaches would you suggest for solving the issue?
Just as I mentioned previously https://github.com/etcd-io/etcd/issues/13757#issuecomment-1057718054, each member can have multiple peer URLs. In the following example, http://1.1.1.1:2380 and http://2.2.2.2::2380 are regarded as two peer URLs of the member mach0. I don't think we should change this existing behavior.
mach0=http://1.1.1.1:2380,mach0=http://2.2.2.2::2380,mach1=http://3.3.3.3:2380,mach2=http://4.4.4.4:2380
I think we just need to print a warning message if users do not provide a value for --name.
Just as I mentioned previously #13757 (comment), each member can have multiple peer URLs. In the following example,
http://1.1.1.1:2380andhttp://2.2.2.2::2380are regarded as two peer URLs of the membermach0. I don't think we should change this existing behavior.mach0=http://1.1.1.1:2380,mach0=http://2.2.2.2::2380,mach1=http://3.3.3.3:2380,mach2=http://4.4.4.4:2380I think we just need to print a warning message if users do not provide a value for
--name.
Thanks for the explanation! I missed that comment...
@nic-chen are you working on this? @ahrtr I was able to reproduce the issue. If @nic-chen is not working on this can I take it up? Also which of the two approaches would you suggest for solving the issue?
hi @UtR491
sure, I reproduced and fixed it locally, just not finished testing, and I wanted to wait for a reply because I'm not that familiar with etcd.
A PR will be submitted this week.
If you could fix it and add test cases quickly, PR is welcome, I wouldn't mind.