etcd
etcd copied to clipboard
Supporting setting `LocalAddr` in peer communication
What would you like to be added?
Currently in peer communication, the local address is automatically chosen by the golang standard lib,
https://github.com/etcd-io/etcd/blob/d8f077151da01d34162c5e6bdccac315777c4369/client/pkg/transport/transport.go#L33-L43
We should support setting a LocalAddr, which defaults to the advertise Peer address if enabled. It should be disabled by default to keep the existing behavior unless users intentionally enable it.
Why is this needed?
When accepting a connection from a peer, each etcd member checks the peer's remote address against the Subject Alternative Name (SAN) field in the peer's certificate, and reject the connection if they don't match. Please read https://etcd.io/docs/v3.5/op-guide/security/#notes-for-tls-authentication.
See also https://github.com/etcd-io/etcd/blob/d8f077151da01d34162c5e6bdccac315777c4369/client/pkg/transport/listener_tls.go#L192-L221
Sometimes the peer may has multiple IP addresses, even including dynamic (i.e. floating) IP addresses. Probably only part of the IP addresses are valid against the SAN field in the certificate. But if an IP address which doesn't match the SAN in certificate is chosen by the golang standard lib, then it will fail to communicate with other etcd members.
@ahrtr Would like to work on this!
Thanks @highpon .
ping @serathius @jmhbnz @wenjiaswe @fuweid for feedback.
cc @aojea for opinion as well
cc @aojea for opinion as well
I can see that there are multiple flags that declare addresses identifying the client, should you not use one of those since those are known to be reachable?
cc @aojea for opinion as well
I can see that there are multiple flags that declare addresses identifying the client, should you not use one of those since those are known to be reachable?
Thanks for the feedback. Yes, I was planning to reuse the --initial-advertise-peer-urls
if users enable this feature (disabled by default to keep the existing behaviour). I am not sure whether it's a best practice to set LocalAddr, and whether there are any potential issue. Any comment?
$ etcd \
--name m1 \
--listen-client-urls http://host1:2379 \
--advertise-client-urls http://host1:2379 \
--listen-peer-urls http://host1:2380 \
--initial-advertise-peer-urls http://host1:2380
Another problem is that it's hard to verify it in workflow CI. For example, how to create multiple IP addresses on the environment.
I am not sure whether it's a best practice to set LocalAddr, and whether there are any potential issue. Any comment?
In an etcd cluster there are some strong requirements or network connectivity between the peers, my experience is that every advanced networking knob like this can have also an undesired effect of providing a footgun for users that starts to implement complex scenarios just because they can or because they came with that idea and then they hit a problem and they start to ask for new knobs. I always prefer to have the use cases and document the scenarios with how the new knobs are supposed to be used before adding them.
I always prefer to have the use cases and document the scenarios with how the new knobs are supposed to be used before adding them.
Yes, agreed on this. It's a real problem discovered in our testing when etcd and kube-vip coexist. Just as I mentioned in https://github.com/etcd-io/etcd/issues/17068#issue-2025936995, When an etcd member accepts a connection from a peer, it checks the peer's remote address against the Subject Alternative Name (SAN) field in the peer's certificate, and reject the connection if they don't match. It's exactly the reason why the LocalAddr matters here.
But when kube-vip coexist with etcd on the same host, kube-vip may dynamically add a floating IP to the host. Especially for the case of IPv6, it adds the IP to the beginning of the interface's IP list. Since etcd doesn't specify the LocalAddr when communicating with a peer, the floating IP is automatically chosen as the LocalAddr
by the golang standard lib, eventually the connection is rejected by the peer.
I believe this is also a common issue. It applies the cases where the hosts on which the etcd runs have multiple IP addresses, especially dynamical floating IP.
Explicitly, it would be great if we can get feedback from network experts on:
-
In the beginning, we will disable the new feature by default, and etcd will not set a
LocalAddr
when communicating with a peer unless users intentionally enable it. But If there is no any concern or side effect, eventually we may want to enable it by default. -
Another problem is how to automatically verify it in workflow CI using an e2e test case to prevent any regression. I don't see an easy way to mimic the case of the host having multiple IP addresses in workflow CI environment. Probably we can try unit test?
that makes sense, then add just an option to define the localAddress for connecting to the apiserver, if is empty just don't set the localAddr field, this will make the change completely backwards compatible and you don't have to worry about the future.
Another problem is how to automatically verify it in workflow CI using an e2e test case to prevent any regressio
some ideas, if you are in the same node you can play with the localhost address 127.0.0.1 and the one in the node
- create a TCP listener that receives the connection and assert on the address you want
- test the negative case, verify that fails to connect if you set a wrong address
you can play with the localhost address 127.0.0.1 and the one in the node
Thanks for the suggestion.
@highpon Please follow comments below to update https://github.com/etcd-io/etcd/pull/17085. Thanks
-
Let's add an option
--set-member-localaddr
(bool
), which defaults tofalse
. If users enable it, then let's use the host in--initial-advertise-peer-urls
as theLocalAddr
when etcd communicates with a peer. -
Follow comment above https://github.com/etcd-io/etcd/issues/17068#issuecomment-1862885759 to add an unit test.
**play with the localhost address 127.0.0.1 and the one in the node**
. We need to test both positive and negative cases.- Please also add an e2e test. We just need to verify positive case in e2e case to ensure it doesn't break anything. Specifically, we just need to set
--set-member-localaddr
totrue
in both 1-member and 3-members clusters, and ensure nothing is broken.
- Please also add an e2e test. We just need to verify positive case in e2e case to ensure it doesn't break anything. Specifically, we just need to set
@ahrtr Sure! Thanks for your help! I would like to update #17085 for your comment.
I've tried to pick up the ball to implement these changes here: https://github.com/flawedmatrix/etcd/tree/fix/17068
One question I had was whether we only wanted to set LocalAddr to an IP address, or should we try to resolve the DNS name to an address from --initial-advertise-peer-urls
?
~~The major problem I ran into was that I've been unable to get the e2e test to fail Mutual TLS when playing around with using the host IP address or 127.0.0.1. It seems the etcd nodes would only start up properly if all the IP addresses provided in --listen-peer-urls
, --initial-advertise-peer-urls
, and --initial-cluster
matched. I've been unable to get the Golang library to infer LocalAddr to be 127.0.0.1 if I used the host IP for the three flags I mentioned.~~
~~@ahrtr Any thoughts as to how one could properly test this feature in an e2e test?~~
After revisiting this issue, I've resolved the issue of how to e2e test this feature.
One question I had was whether we only wanted to set LocalAddr to an IP address, or should we try to resolve the DNS name to an address from
--initial-advertise-peer-urls
?
The answer should be the latter based on https://github.com/etcd-io/etcd/issues/17068#issuecomment-1863121505