etcd
etcd copied to clipboard
grpc-proxy flakiness when connecting to etcd cluster
Hi,
I'm deploying a 3-proxy, 3-node etcd cluster, and noticed some flakiness when the proxy starts:
2017-06-23 01:22:01.950109 I | etcdmain: listening for grpc-proxy client requests on 0.0.0.0:2379
2017-06-23 01:22:01.953757 I | etcdserver/api/v3rpc: Failed to dial etcd-cluster-0001.etcd-cluster.default.svc.cluster.local:2379: grpc: the connection is closing; please retry.
2017-06-23 01:22:01.954053 I | etcdserver/api/v3rpc: Failed to dial : grpc: the connection is closing; please retry.
2017-06-23 01:22:01.967483 I | grpcproxy: registered "10.92.6.12:2379" with 60-second lease
The number of Fail to dial
error varies between 0-3, and in case of 3, the grpc proxy didn't register to the cluster at all(without the last log line) but the process also didn't die.
Command to bring up proxy:
/usr/local/bin/etcd grpc-proxy start
--advertise-client-url="$POD_IP:2379"
--listen-addr="0.0.0.0:2379"
--resolver-prefix="___grpc_proxy_endpoint"
--resolver-ttl=60
--endpoints="etcd-cluster-0000.etcd-cluster.default.svc.cluster.local:2379,etcd-cluster-0001.etcd-cluster.default.svc.cluster.local:2379,etcd-cluster-0002.etcd-cluster.default.svc.cluster.local:2379"
etcd nodes were brought up by etcd-operator.
Version: v3.2.0 Kubernetes: v1.6.4
This might be a client bug related to name resolution failure that's causing it to hang on dial out. It doesn't look proxy-specific.
@clusc is there an easy way to reproduce this issue?
it's pretty consistently reproducible for me.
@clusc Sure. But the questions was how to easily reproduce it.
not sure, maybe try the same kubernetes version and etcd-operator deployment?
@xiang90 I have a similar issue which I can reproduce very easily with etcdctl. Every second time I get an exception (transport: context canceled
).
See output:
[root@hostname1 ~]# /appl/etcd/bin/etcd grpc-proxy start --endpoints="https://hostname1:7002,https://hostname2:7002,https://hostname3:7002"
--advertise-client-url="hostname1:23790" --listen-addr="hostname1:23790"
2017-09-06 16:15:45.911404 I | etcdmain: listening for grpc-proxy client requests on hostname1:23790 2017-09-06 16:15:45.936172 E | etcdmain: forgot to set Type=notify in systemd service file?
^C
[root@hostname1 ~]# /appl/etcd/bin/etcd grpc-proxy start --endpoints="https://hostname1:7002,https://hostname2:7002,https://hostname3:7002" --advertise-client-url="hostname1:23790" --listen-addr="hostname1:23790"
2017-09-06 16:18:47.687532 I | etcdmain: listening for grpc-proxy client requests on hostname1:23790
2017-09-06 16:18:47.712193 I | etcdserver/api/v3rpc: Failed to dial hostname3:7002: grpc: the connection is closing; please retry.
2017-09-06 16:18:47.712211 I | etcdserver/api/v3rpc: Failed to dial hostname2:7002: grpc: the connection is closing; please retry.
2017-09-06 16:18:47.712361 E | etcdmain: forgot to set Type=notify in systemd service file?
^C
[root@hostname1 ~]# /appl/etcd/bin/etcd grpc-proxy start --endpoints="https://hostname1:7002,https://hostname2:7002,https://hostname3:7002" --advertise-client-url="hostname1:23790" --listen-addr="hostname1:23790"
2017-09-06 16:18:50.320070 I | etcdmain: listening for grpc-proxy client requests on hostname1:23790
2017-09-06 16:18:50.342465 E | etcdmain: forgot to set Type=notify in systemd service file?
^C
[root@hostname1 ~]# /appl/etcd/bin/etcd grpc-proxy start --endpoints="https://hostname1:7002,https://hostname2:7002,https://hostname3:7002" --advertise-client-url="hostname1:23790" --listen-addr="hostname1:23790"
2017-09-06 16:18:52.918694 I | etcdmain: listening for grpc-proxy client requests on hostname1:23790
2017-09-06 16:18:52.941630 E | etcdmain: forgot to set Type=notify in systemd service file?
2017-09-06 16:18:52.941793 I | etcdserver/api/v3rpc: Failed to dial hostname3:7002: connection error: desc = "transport: context canceled"; please retry.
2017-09-06 16:18:52.942041 I | etcdserver/api/v3rpc: Failed to dial hostname2:7002: grpc: the connection is closing; please retry.
^C
[root@hostname1 ~]# /appl/etcd/bin/etcd grpc-proxy start --endpoints="https://hostname1:7002,https://hostname2:7002,https://hostname3:7002" --advertise-client-url="hostname1:23790" --listen-addr="hostname1:23790"
2017-09-06 16:18:55.662430 I | etcdmain: listening for grpc-proxy client requests on hostname1:23790
2017-09-06 16:18:55.685741 E | etcdmain: forgot to set Type=notify in systemd service file?
^C
[root@hostname1 ~]# /appl/etcd/bin/etcd grpc-proxy start --endpoints="https://hostname1:7002,https://hostname2:7002,https://hostname3:7002" --advertise-client-url="hostname1:23790" --listen-addr="hostname1:23790"
2017-09-06 16:19:00.607871 I | etcdmain: listening for grpc-proxy client requests on hostname1:23790
2017-09-06 16:19:00.630928 E | etcdmain: forgot to set Type=notify in systemd service file?
2017-09-06 16:19:00.631082 I | etcdserver/api/v3rpc: Failed to dial hostname2:7002: connection error: desc = "transport: context canceled"; please retry.
2017-09-06 16:19:00.631308 I | etcdserver/api/v3rpc: Failed to dial hostname3:7002: grpc: the connection is closing; please retry.
@zbindenren can you share your reproduce steps? etcd/etcdctl versions?
@xiang90 etcd and etcdctl both v3.2.7. Endpoints secured by certificate.
@xiang90 The same problem with v3.2.8
And #8631 is probably a duplicate.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.