edgemesh icon indicating copy to clipboard operation
edgemesh copied to clipboard

Cross-Edge-Cloud communication failed

Open bluven opened this issue 3 years ago • 6 comments

What happened: The Cross-Edge-Cloud test case provided in https://edgemesh.netlify.app/guide/test-case.html failed to pass.

[root@shanghai edgemesh-1.12.0]# kubectl exec -ti busybox-sleep-edge-675c5b84f8-bmcd6 -- sh
/ # telnet tcp-echo-cloud-svc.cloudzone 2701
telnet: bad address 'tcp-echo-cloud-svc.cloudzone'

What you expected to happen:

Edge pods can access cloud service

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?: The edgemesh-agent info:

[root@shanghai ~]# kubectl get po -o wide -n kubeedge
NAME                   READY   STATUS    RESTARTS   AGE    IP            NODE       NOMINATED NODE   READINESS GATES
edgemesh-agent-2zxct   1/1     Running   0          2d6h   10.22.46.11   shanghai   <none>           <none>
edgemesh-agent-58rdt   1/1     Running   0          2d6h   10.22.46.41   edge1      <none>           <none>
edgemesh-agent-d7zv7   1/1     Running   0          2d6h   10.22.46.16   node1      <none>           <none>
edgemesh-agent-dx8t7   1/1     Running   0          2d6h   10.22.46.15   edge2      <none>           <none>
edgemesh-agent-f74df   1/1     Running   0          2d     10.0.2.15     edge3      <none>           <none>

The resolv.conf file in busybox-sleep-edge pod:

cat /etc/resolv.conf 
nameserver 169.254.96.16
search edgezone.svc.cluster.local svc.cluster.local cluster.local openstacklocal
options ndots:5

Some log in the edgemesh-agent:

I1010 14:11:20.890204       1 log.go:184] [INFO] 172.17.0.6:52977 - 2 "AAAA IN tcp-echo-cloud-svc.cloudzone. udp 46 false 512" NXDOMAIN qr,rd,ra 121 0.052209701s
I1010 14:11:20.890668       1 log.go:184] [INFO] 172.17.0.6:37848 - 3 "AAAA IN tcp-echo-cloud-svc.cloudzone.edgezone.svc.cluster.local. udp 73 false 512" NXDOMAIN qr,aa,rd 166 0.000227083s
I1010 14:11:20.890932       1 log.go:184] [INFO] 172.17.0.6:37736 - 4 "AAAA IN tcp-echo-cloud-svc.cloudzone.svc.cluster.local. udp 64 false 512" NXDOMAIN qr,aa,rd 157 0.000116228s
I1010 14:11:20.891135       1 log.go:184] [INFO] 172.17.0.6:48259 - 5 "AAAA IN tcp-echo-cloud-svc.cloudzone.cluster.local. udp 60 false 512" NXDOMAIN qr,aa,rd 153 0.000080203s
I1010 14:11:20.949602       1 log.go:184] [INFO] 172.17.0.6:34899 - 6 "AAAA IN tcp-echo-cloud-svc.cloudzone.openstacklocal. udp 61 false 512" NXDOMAIN qr,rd,ra 136 0.058321129s
I1010 14:11:21.000222       1 log.go:184] [INFO] 172.17.0.6:33599 - 7 "A IN tcp-echo-cloud-svc.cloudzone. udp 46 false 512" NXDOMAIN qr,rd,ra 121 0.050333411s
I1010 14:11:21.000624       1 log.go:184] [INFO] 172.17.0.6:37327 - 8 "A IN tcp-echo-cloud-svc.cloudzone.edgezone.svc.cluster.local. udp 73 false 512" NXDOMAIN qr,aa,rd 166 0.000159778s
I1010 14:11:21.000847       1 log.go:184] [INFO] 172.17.0.6:56653 - 9 "A IN tcp-echo-cloud-svc.cloudzone.svc.cluster.local. udp 64 false 512" NXDOMAIN qr,aa,rd 157 0.000085544s
I1010 14:11:21.001066       1 log.go:184] [INFO] 172.17.0.6:55967 - 10 "A IN tcp-echo-cloud-svc.cloudzone.cluster.local. udp 60 false 512" NXDOMAIN qr,aa,rd 153 0.000096511s
I1010 14:11:21.055726       1 log.go:184] [INFO] 172.17.0.6:48970 - 11 "A IN tcp-echo-cloud-svc.cloudzone.openstacklocal. udp 61 false 512" NXDOMAIN qr,rd,ra 136 0.054526477s

Environment:

  • EdgeMesh version: v1.12.0-dirty
  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.5", GitCommit:"5c99e2ac2ff9a3c549d9ca665e7bc05a3e18f07e", GitTreeState:"clean", BuildDate:"2021-12-16T08:38:33Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.5", GitCommit:"5c99e2ac2ff9a3c549d9ca665e7bc05a3e18f07e", GitTreeState:"clean", BuildDate:"2021-12-16T08:32:32Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}
  • KubeEdge version(e.g. cloudcore --version and edgecore --version): v1.11.0

  • Cloud nodes Environment:
    • Hardware configuration (e.g. lscpu):
    • OS (e.g. cat /etc/os-release):
    • Kernel (e.g. uname -a):
    • Go version (e.g. go version):
    • Others:
  • Edge nodes Environment:
    • edgecore version (e.g. edgecore --version):
    • Hardware configuration (e.g. lscpu):
    • OS (e.g. cat /etc/os-release):
    • Kernel (e.g. uname -a):
    • Go version (e.g. go version):
    • Others:

bluven avatar Oct 10 '22 06:10 bluven

Another problem occured: I test HTTP test case again, a connection reset error happened:

/ # curl hostname-svc:12345
curl: (56) Recv failure: Connection reset by peer

bluven avatar Oct 10 '22 07:10 bluven

Interesting, I created another node egde3 which is located in a different subnet and when I execute telnet tcp-echo-cloud-svc.cloudzone 2701, it worked.

Somehow, I still couldn't make curl hostname-svc:12345 work on pods on edge3 node.

bluven avatar Oct 10 '22 09:10 bluven

Can you print some edgemesh-agent logs when you exec curl hostname-svc:12345?

Poorunga avatar Oct 11 '22 06:10 Poorunga

@Poorunga

The edgemesh-agent log from client pod side:

I1012 17:44:03.179985       1 tunnel.go:239] New stream between peer {12D3KooWKZpxgSRMSMd6Favs6KBYbnBoA9X3fYq3dqgrKWQRWDvN: [/ip4/127.0.0.1/tcp/20006 /ip4/10.22.46.11/tcp/20006]} success
I1012 17:44:03.184605       1 loadbalancer.go:731] Dial libp2p network between coredns-7f6cbbb7b8-p94p4 - {udp shanghai 10.234.64.5:53}
I1012 17:44:03.187717       1 tunnel.go:239] New stream between peer {12D3KooWC2TDds3pYR4HuNtrSdqezJx9wtvRoVHpsX59Gkp8YuUW: [/ip4/127.0.0.1/tcp/20006 /ip4/10.22.46.41/tcp/20006]} success
E1012 17:44:03.441571       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from edge1 error: libp2p dial {tcp edge1 172.17.0.2 9376} err: Proxy.type is FAILED"
I1012 17:44:03.441698       1 tunnel.go:239] New stream between peer {12D3KooWC2TDds3pYR4HuNtrSdqezJx9wtvRoVHpsX59Gkp8YuUW: [/ip4/10.22.46.41/tcp/20006 /ip4/127.0.0.1/tcp/20006]} success
E1012 17:44:03.946424       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from edge1 error: libp2p dial {tcp edge1 172.17.0.2 9376} err: Proxy.type is FAILED"
I1012 17:44:03.946548       1 tunnel.go:239] New stream between peer {12D3KooWC2TDds3pYR4HuNtrSdqezJx9wtvRoVHpsX59Gkp8YuUW: [/ip4/127.0.0.1/tcp/20006 /ip4/10.22.46.41/tcp/20006]} success
E1012 17:44:04.947965       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from edge1 error: libp2p dial {tcp edge1 172.17.0.2 9376} err: Proxy.type is FAILED"
I1012 17:44:04.948076       1 tunnel.go:239] New stream between peer {12D3KooWC2TDds3pYR4HuNtrSdqezJx9wtvRoVHpsX59Gkp8YuUW: [/ip4/127.0.0.1/tcp/20006 /ip4/10.22.46.41/tcp/20006]} success
E1012 17:44:06.953762       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from edge1 error: libp2p dial {tcp edge1 172.17.0.2 9376} err: Proxy.type is FAILED"
E1012 17:44:06.953814       1 proxysocket.go:98] "Failed to connect to balancer" err="failed to connect to an endpoint"

The edgemesh-agent log from hostname-edge pod side:

I1012 17:44:03.085064       1 tunnel.go:289] Proxy service got a new stream from {12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK: [/ip4/10.22.46.16/tcp/20006]}
E1012 17:44:03.086548       1 tunnel.go:369] max retries for dial
E1012 17:44:03.086561       1 tunnel.go:313] l4 proxy connect to type:CONNECT protocol:"tcp" node_name:"edge1" ip:"172.17.0.2" port:9376  err: dial tcp 172.17.0.2:9376: connect: connection refused
I1012 17:44:03.338454       1 tunnel.go:289] Proxy service got a new stream from {12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK: [/ip4/10.22.46.16/tcp/20006]}
E1012 17:44:03.338998       1 tunnel.go:369] max retries for dial
E1012 17:44:03.339012       1 tunnel.go:313] l4 proxy connect to type:CONNECT protocol:"tcp" node_name:"edge1" ip:"172.17.0.2" port:9376  err: dial tcp 172.17.0.2:9376: connect: connection refused
I1012 17:44:03.843240       1 tunnel.go:289] Proxy service got a new stream from {12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK: [/ip4/10.22.46.16/tcp/20006]}
E1012 17:44:03.843771       1 tunnel.go:369] max retries for dial
E1012 17:44:03.843782       1 tunnel.go:313] l4 proxy connect to type:CONNECT protocol:"tcp" node_name:"edge1" ip:"172.17.0.2" port:9376  err: dial tcp 172.17.0.2:9376: connect: connection refused
I1012 17:44:04.844957       1 tunnel.go:289] Proxy service got a new stream from {12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK: [/ip4/10.22.46.16/tcp/20006]}
E1012 17:44:04.845481       1 tunnel.go:369] max retries for dial
E1012 17:44:04.845492       1 tunnel.go:313] l4 proxy connect to type:CONNECT protocol:"tcp" node_name:"edge1" ip:"172.17.0.2" port:9376  err: dial tcp 172.17.0.2:9376: connect: connection refused

bluven avatar Oct 12 '22 09:10 bluven

Are your pod(172.17.0.2:9376) normal?

Poorunga avatar Oct 13 '22 03:10 Poorunga

@Poorunga I cleared the old resources and did these examples again and found another problem.

First let me show you the pods information:

[root@shanghai edgemesh-1.12.0]# kubectl get po -o wide
NAME                                   READY   STATUS    RESTARTS   AGE     IP             NODE    NOMINATED NODE   READINESS GATES
alpine-test                            1/1     Running   0          7m1s    10.234.65.14   node1   <none>           <none>
hostname-edge-5cd47b65d5-bnwxl         1/1     Running   0          2m50s   172.17.0.7     edge3   <none>           <none>
hostname-lb-edge-5cdf5c758c-75jmp      1/1     Running   0          2m12s   172.17.0.3     edge1   <none>           <none>
hostname-lb-edge-5cdf5c758c-ccsqp      1/1     Running   0          2m12s   172.17.0.8     edge3   <none>           <none>
hostname-lb-edge-5cdf5c758c-pfkxx      1/1     Running   0          2m12s   172.17.0.2     edge2   <none>           <none>
net-tool-edge3                         1/1     Running   0          5m39s   172.17.0.3     edge3   <none>           <none>
nginx-https-84c9fc57f8-x9h56           1/1     Running   0          4m35s   172.17.0.4     edge3   <none>           <none>
tcp-echo-deployment-85654c8b8f-l2zg6   1/1     Running   0          4m12s   172.17.0.5     edge3   <none>           <none>
websocket-test                         1/1     Running   0          7m1s    172.17.0.2     edge3   <none>           <none>
ws-edge-5978d75769-g7bwh               1/1     Running   0          3m25s   172.17.0.6     edge3   <none>           <none>

All pods are ready.

Then I execute curl hostname-svc:12345 in alpine-test pod:

hostname-lb-edge-5cdf5c758c-pfkxx
/ # curl hostname-lb-svc:12345
hostname-lb-edge-5cdf5c758c-75jmp
/ # curl hostname-svc:12345
curl: (6) Could not resolve host: hostname-svc
/ # curl hostname-svc:12345
curl: (6) Could not resolve host: hostname-svc
/ # 

I want to know what would happen if I execute curl hostname-svc:12345 on another pod, so I create net-tool-edge3 pod:

/ # curl hostname-lb-svc:12345
hostname-lb-edge-5cdf5c758c-pfkxx
/ # curl hostname-svc:12345
curl: (56) Recv failure: Connection reset by peer

As you can see, both alpine-test and net-tool-edge3 can access "hostname-lb-svc:12345", but alpine-test couldn't resolve the host and net-tool-edge3 report "connection reset by peer error".

I need to point it out: hostname-svc could be visited at first, but after I execute kubectl apply -f examples/hostname-lb-random.yam , the probelm happened.

Here is the hostname-svc information:

[root@shanghai edgemesh-1.12.0]# kubectl describe svc hostname-svc
Name:              hostname-svc
Namespace:         default
Labels:            <none>
Annotations:       <none>
Selector:          app=hostname-edge
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.234.62.186
IPs:               10.234.62.186
Port:              http-0  12345/TCP
TargetPort:        9376/TCP
Endpoints:         172.17.0.7:9376
Session Affinity:  None
Events:            <none>

In alpine-test pod I visited hostname-svc by ClusterIP, it also reported "connection reset by peer":

/ # curl 10.234.62.186:12345
curl: (56) Recv failure: Connection reset by peer

Below is log from edgemesh-agent on node1(curl hostname-svc:12345):

I1013 16:56:38.961989       1 tunnel.go:239] New stream between peer {12D3KooWKZpxgSRMSMd6Favs6KBYbnBoA9X3fYq3dqgrKWQRWDvN: [/ip4/10.22.46.11/tcp/20006 /ip4/127.0.0.1/tcp/20006]} success
E1013 16:56:39.215957       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from shanghai error: read conn result msg from shanghai err: stream reset"
I1013 16:56:39.216061       1 tunnel.go:239] New stream between peer {12D3KooWKZpxgSRMSMd6Favs6KBYbnBoA9X3fYq3dqgrKWQRWDvN: [/ip4/127.0.0.1/tcp/20006 /ip4/10.22.46.11/tcp/20006]} success
E1013 16:56:39.719907       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from shanghai error: read conn result msg from shanghai err: stream reset"
I1013 16:56:39.720021       1 tunnel.go:239] New stream between peer {12D3KooWKZpxgSRMSMd6Favs6KBYbnBoA9X3fYq3dqgrKWQRWDvN: [/ip4/10.22.46.11/tcp/20006 /ip4/127.0.0.1/tcp/20006]} success
E1013 16:56:40.728933       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from shanghai error: read conn result msg from shanghai err: stream reset"
I1013 16:56:40.729032       1 tunnel.go:239] New stream between peer {12D3KooWKZpxgSRMSMd6Favs6KBYbnBoA9X3fYq3dqgrKWQRWDvN: [/ip4/10.22.46.11/tcp/20006 /ip4/127.0.0.1/tcp/20006]} success
E1013 16:56:42.732067       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from shanghai error: read conn result msg from shanghai err: stream reset"
I1013 16:56:42.732223       1 tunnel.go:239] New stream between peer {12D3KooWKZpxgSRMSMd6Favs6KBYbnBoA9X3fYq3dqgrKWQRWDvN: [/ip4/10.22.46.11/tcp/20006 /ip4/127.0.0.1/tcp/20006]} success
E1013 16:56:42.986407       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from shanghai error: read conn result msg from shanghai err: stream reset"
I1013 16:56:42.986531       1 tunnel.go:239] New stream between peer {12D3KooWKZpxgSRMSMd6Favs6KBYbnBoA9X3fYq3dqgrKWQRWDvN: [/ip4/10.22.46.11/tcp/20006 /ip4/127.0.0.1/tcp/20006]} success
E1013 16:56:43.491382       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from shanghai error: read conn result msg from shanghai err: stream reset"
I1013 16:56:43.491489       1 tunnel.go:239] New stream between peer {12D3KooWKZpxgSRMSMd6Favs6KBYbnBoA9X3fYq3dqgrKWQRWDvN: [/ip4/10.22.46.11/tcp/20006 /ip4/127.0.0.1/tcp/20006]} success
E1013 16:56:44.497737       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from shanghai error: read conn result msg from shanghai err: stream reset"
I1013 16:56:44.497838       1 tunnel.go:239] New stream between peer {12D3KooWKZpxgSRMSMd6Favs6KBYbnBoA9X3fYq3dqgrKWQRWDvN: [/ip4/10.22.46.11/tcp/20006 /ip4/127.0.0.1/tcp/20006]} success
E1013 16:56:46.500768       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from shanghai error: read conn result msg from shanghai err: stream reset"
I1013 16:56:46.500884       1 tunnel.go:239] New stream between peer {12D3KooWKZpxgSRMSMd6Favs6KBYbnBoA9X3fYq3dqgrKWQRWDvN: [/ip4/10.22.46.11/tcp/20006 /ip4/127.0.0.1/tcp/20006]} success
E1013 16:56:46.757748       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from shanghai error: read conn result msg from shanghai err: stream reset"
I1013 16:56:46.757846       1 tunnel.go:239] New stream between peer {12D3KooWKZpxgSRMSMd6Favs6KBYbnBoA9X3fYq3dqgrKWQRWDvN: [/ip4/10.22.46.11/tcp/20006 /ip4/127.0.0.1/tcp/20006]} success
E1013 16:56:47.261121       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from shanghai error: read conn result msg from shanghai err: stream reset"
I1013 16:56:47.261217       1 tunnel.go:239] New stream between peer {12D3KooWKZpxgSRMSMd6Favs6KBYbnBoA9X3fYq3dqgrKWQRWDvN: [/ip4/10.22.46.11/tcp/20006 /ip4/127.0.0.1/tcp/20006]} success
E1013 16:56:48.264556       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from shanghai error: read conn result msg from shanghai err: stream reset"
I1013 16:56:48.264721       1 tunnel.go:239] New stream between peer {12D3KooWKZpxgSRMSMd6Favs6KBYbnBoA9X3fYq3dqgrKWQRWDvN: [/ip4/10.22.46.11/tcp/20006 /ip4/127.0.0.1/tcp/20006]} success
E1013 16:56:50.269990       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from shanghai error: read conn result msg from shanghai err: stream reset"
I1013 16:56:50.270135       1 tunnel.go:239] New stream between peer {12D3KooWKZpxgSRMSMd6Favs6KBYbnBoA9X3fYq3dqgrKWQRWDvN: [/ip4/10.22.46.11/tcp/20006 /ip4/127.0.0.1/tcp/20006]} success
E1013 16:56:50.524043       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from shanghai error: read conn result msg from shanghai err: stream reset"
I1013 16:56:50.524250       1 tunnel.go:239] New stream between peer {12D3KooWKZpxgSRMSMd6Favs6KBYbnBoA9X3fYq3dqgrKWQRWDvN: [/ip4/10.22.46.11/tcp/20006 /ip4/127.0.0.1/tcp/20006]} success
E1013 16:56:51.030072       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from shanghai error: read conn result msg from shanghai err: stream reset"
I1013 16:56:51.030204       1 tunnel.go:239] New stream between peer {12D3KooWKZpxgSRMSMd6Favs6KBYbnBoA9X3fYq3dqgrKWQRWDvN: [/ip4/127.0.0.1/tcp/20006 /ip4/10.22.46.11/tcp/20006]} success
E1013 16:56:52.031481       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from shanghai error: read conn result msg from shanghai err: stream reset"
I1013 16:56:52.031584       1 tunnel.go:239] New stream between peer {12D3KooWKZpxgSRMSMd6Favs6KBYbnBoA9X3fYq3dqgrKWQRWDvN: [/ip4/10.22.46.11/tcp/20006 /ip4/127.0.0.1/tcp/20006]} success
E1013 16:56:54.033281       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from shanghai error: read conn result msg from shanghai err: stream reset"

There is no log from edgemesh-agent on edge3.

Below is log from edgemesh-agent on node1(curl 10.234.62.186:12345):

I1013 16:59:33.545830       1 tunnel.go:239] New stream between peer {12D3KooWRw7gVHcccnDsWsYtdMyipmg83zrLmNSNf4s1CCxHoayG: [/ip4/127.0.0.1/tcp/20006 /ip4/10.0.2.15/tcp/20006]} success
E1013 16:59:33.808234       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from edge3 error: libp2p dial {tcp edge3 172.17.0.7 9376} err: Proxy.type is FAILED"
I1013 16:59:33.808335       1 tunnel.go:239] New stream between peer {12D3KooWRw7gVHcccnDsWsYtdMyipmg83zrLmNSNf4s1CCxHoayG: [/ip4/127.0.0.1/tcp/20006 /ip4/10.0.2.15/tcp/20006]} success
E1013 16:59:34.320988       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from edge3 error: libp2p dial {tcp edge3 172.17.0.7 9376} err: Proxy.type is FAILED"
I1013 16:59:34.321127       1 tunnel.go:239] New stream between peer {12D3KooWRw7gVHcccnDsWsYtdMyipmg83zrLmNSNf4s1CCxHoayG: [/ip4/10.0.2.15/tcp/20006 /ip4/127.0.0.1/tcp/20006]} success
E1013 16:59:35.332130       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from edge3 error: libp2p dial {tcp edge3 172.17.0.7 9376} err: Proxy.type is FAILED"
I1013 16:59:35.332245       1 tunnel.go:239] New stream between peer {12D3KooWRw7gVHcccnDsWsYtdMyipmg83zrLmNSNf4s1CCxHoayG: [/ip4/10.0.2.15/tcp/20006 /ip4/127.0.0.1/tcp/20006]} success
E1013 16:59:37.343151       1 loadbalancer.go:683] "Dial failed" err="get proxy stream from edge3 error: libp2p dial {tcp edge3 172.17.0.7 9376} err: Proxy.type is FAILED"
E1013 16:59:37.343217       1 proxysocket.go:98] "Failed to connect to balancer" err="failed to connect to an endpoint"
I1013 16:59:40.258014       1 tunnel.go:118] [MDNS] Discovery found peer: {12D3KooWC2TDds3pYR4HuNtrSdqezJx9wtvRoVHpsX59Gkp8YuUW: [/ip4/127.0.0.1/tcp/20006 /ip4/10.22.46.41/tcp/20006]}
I1013 16:59:40.258197       1 tunnel.go:130] [MDNS] New stream between peer {12D3KooWC2TDds3pYR4HuNtrSdqezJx9wtvRoVHpsX59Gkp8YuUW: [/ip4/127.0.0.1/tcp/20006 /ip4/10.22.46.41/tcp/20006 /ip4/172.17.0.1/tcp/20006 /ip4/169.254.96.16/tcp/20006]} success
I1013 16:59:40.261937       1 tunnel.go:175] Discovery service got a new stream from {12D3KooWC2TDds3pYR4HuNtrSdqezJx9wtvRoVHpsX59Gkp8YuUW: [/ip4/10.22.46.41/tcp/20006]}
I1013 16:59:40.262034       1 tunnel.go:204] [MDNS] Discovery from edge1 : {12D3KooWC2TDds3pYR4HuNtrSdqezJx9wtvRoVHpsX59Gkp8YuUW: [/ip4/10.22.46.41/tcp/20006]}
I1013 16:59:40.262073       1 tunnel.go:166] [MDNS] Discovery to edge1 : {12D3KooWC2TDds3pYR4HuNtrSdqezJx9wtvRoVHpsX59Gkp8YuUW: [/ip4/127.0.0.1/tcp/20006 /ip4/10.22.46.41/tcp/20006 /ip4/172.17.0.1/tcp/20006 /ip4/169.254.96.16/tcp/20006]}
I1013 16:59:40.262122       1 tunnel.go:118] [MDNS] Discovery found peer: {12D3KooWKZpxgSRMSMd6Favs6KBYbnBoA9X3fYq3dqgrKWQRWDvN: [/ip4/10.22.46.11/tcp/20006 /ip4/127.0.0.1/tcp/20006]}
I1013 16:59:40.262159       1 tunnel.go:130] [MDNS] New stream between peer {12D3KooWKZpxgSRMSMd6Favs6KBYbnBoA9X3fYq3dqgrKWQRWDvN: [/ip4/127.0.0.1/tcp/20006 /ip4/10.22.46.11/tcp/20006 /ip4/172.17.0.1/tcp/20006 /ip4/169.254.96.16/tcp/20006]} success
I1013 16:59:40.264758       1 tunnel.go:166] [MDNS] Discovery to shanghai : {12D3KooWKZpxgSRMSMd6Favs6KBYbnBoA9X3fYq3dqgrKWQRWDvN: [/ip4/127.0.0.1/tcp/20006 /ip4/10.22.46.11/tcp/20006 /ip4/172.17.0.1/tcp/20006 /ip4/169.254.96.16/tcp/20006]}
I1013 16:59:40.264794       1 tunnel.go:118] [MDNS] Discovery found peer: {12D3KooWKZpxgSRMSMd6Favs6KBYbnBoA9X3fYq3dqgrKWQRWDvN: [/ip4/10.22.46.11/tcp/20006 /ip4/127.0.0.1/tcp/20006]}
I1013 16:59:40.264842       1 tunnel.go:130] [MDNS] New stream between peer {12D3KooWKZpxgSRMSMd6Favs6KBYbnBoA9X3fYq3dqgrKWQRWDvN: [/ip4/127.0.0.1/tcp/20007 /ip4/10.22.46.11/tcp/20007 /ip4/172.17.0.1/tcp/20007 /ip4/169.254.96.16/tcp/20007 /ip4/10.234.64.0/tcp/20007]} success
I1013 16:59:40.267318       1 tunnel.go:166] [MDNS] Discovery to shanghai : {12D3KooWKZpxgSRMSMd6Favs6KBYbnBoA9X3fYq3dqgrKWQRWDvN: [/ip4/127.0.0.1/tcp/20007 /ip4/10.22.46.11/tcp/20007 /ip4/172.17.0.1/tcp/20007 /ip4/169.254.96.16/tcp/20007 /ip4/10.234.64.0/tcp/20007]}

The log from edgemesh-agent on edge3:

I1013 08:59:34.658794       1 tunnel.go:289] Proxy service got a new stream from {12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK: [/ip4/10.22.46.16/tcp/20006]}
E1013 08:59:34.659542       1 tunnel.go:369] max retries for dial
E1013 08:59:34.659560       1 tunnel.go:313] l4 proxy connect to type:CONNECT protocol:"tcp" node_name:"edge3" ip:"172.17.0.7" port:9376  err: dial tcp 172.17.0.7:9376: connect: connection refused
I1013 08:59:34.920378       1 tunnel.go:289] Proxy service got a new stream from {12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK: [/ip4/10.22.46.16/tcp/20006]}
E1013 08:59:34.920961       1 tunnel.go:369] max retries for dial
E1013 08:59:34.920976       1 tunnel.go:313] l4 proxy connect to type:CONNECT protocol:"tcp" node_name:"edge3" ip:"172.17.0.7" port:9376  err: dial tcp 172.17.0.7:9376: connect: connection refused
I1013 08:59:35.432928       1 tunnel.go:289] Proxy service got a new stream from {12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK: [/ip4/10.22.46.16/tcp/20006]}
E1013 08:59:35.434232       1 tunnel.go:369] max retries for dial
E1013 08:59:35.434255       1 tunnel.go:313] l4 proxy connect to type:CONNECT protocol:"tcp" node_name:"edge3" ip:"172.17.0.7" port:9376  err: dial tcp 172.17.0.7:9376: connect: connection refused
I1013 08:59:36.443775       1 tunnel.go:289] Proxy service got a new stream from {12D3KooW9yF6fZvnMC3zxXSrxZSq1JM8otuwz7uTQ7FjNfKtFxVK: [/ip4/10.22.46.16/tcp/20006]}
E1013 08:59:36.444424       1 tunnel.go:369] max retries for dial
E1013 08:59:36.444443       1 tunnel.go:313] l4 proxy connect to type:CONNECT protocol:"tcp" node_name:"edge3" ip:"172.17.0.7" port:9376  err: dial tcp 172.17.0.7:9376: connect: connection refused

Here is aother log from edgemesh-agent on edge3, this time both net-tool-edge3 and hostname-svc's endpoint are on the same node edge3:

I1013 09:01:38.175587       1 log.go:184] [INFO] 172.17.0.3:59059 - 45954 "AAAA IN hostname-svc.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd 149 0.000104469s
I1013 09:01:38.175626       1 log.go:184] [INFO] 172.17.0.3:59059 - 45764 "A IN hostname-svc.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd 110 0.000146897s
E1013 09:01:38.175934       1 loadbalancer.go:683] "Dial failed" err="dial tcp 172.17.0.7:9376: connect: connection refused"
E1013 09:01:38.175979       1 loadbalancer.go:683] "Dial failed" err="dial tcp 172.17.0.7:9376: connect: connection refused"
E1013 09:01:38.176030       1 loadbalancer.go:683] "Dial failed" err="dial tcp 172.17.0.7:9376: connect: connection refused"
E1013 09:01:38.176145       1 loadbalancer.go:683] "Dial failed" err="dial tcp 172.17.0.7:9376: connect: connection refused"
E1013 09:01:38.176156       1 proxysocket.go:98] "Failed to connect to balancer" err="failed to connect to an endpoint"

bluven avatar Oct 13 '22 09:10 bluven

You can try to telnet 172.17.0.7 9376 on edge3, make sure pod is working normal.

Poorunga avatar Nov 01 '22 01:11 Poorunga