old_issues_repo icon indicating copy to clipboard operation
old_issues_repo copied to clipboard

upstream connect error or disconnect/reset before headers

Open adamzhoul opened this issue 6 years ago • 8 comments

env

istioctl           0.6
kubectl          1.9
install from:  github
install file :    ./install/kubernetes/istio.yaml
example:      ./samples/bookinfo/kube/bookinfo.yaml

problem

follow the instruction, installed the bookinfo sample.
when i visit  http://myip:30144/productpage
show this:
        upstream connect error or disconnect/reset before headers 

status of ingress

check pod log of ingress : lds: fetch failure: network error
login ingress pod ,run command:   curl 127.0.0.1:15000/stats
parts of result:
    cluster.out.productpage.istio-test.svc.cluster.local|http|version=v1.internal.upstream_rq_503: 6
    cluster.out.productpage.istio-test.svc.cluster.local|http|version=v1.internal.upstream_rq_5xx: 6
    cluster.rds.internal.upstream_rq_503: 247
    cluster.rds.internal.upstream_rq_5xx: 247

status of pilot

in istio-pilot pod, istio-proxy container: 
run tcpflow command : tcpflow -cp |grep -A 5 GET
parts of result:
      010.233.069.199.56086-010.233.069.190.15003: GET /v1/registration/productpage.istio- test.svc.cluster.local|http|version=v1 HTTP/1.1
       host: rds
       x-envoy-internal: true
       x-forwarded-for: 10.233.69.199
       x-envoy-expected-rq-timeout-ms: 1000
      content-length: 0

this verifies that envoy in ingress can't get info from pilot. however ,when i run command:

  curl -v 127.0.0.1:8080/v1/registration/productpage.istio-test.svc.cluster.local

*   Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)
> GET /v1/registration/productpage.istio-test.svc.cluster.local HTTP/1.1
> Host: 127.0.0.1:8080
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 200 OK 
< Date: Tue, 03 Apr 2018 11:35:12 GMT
< Content-Length: 18
< Content-Type: text/plain; charset=utf-8
<
{
    "hosts": []
    * Connection #0 to host 127.0.0.1 left intact
 }

ss -tnlp|grep 15003
LISTEN     0      128          *:15003                    *:*                   users:(("envoy",pid=1464,fd=366))

this means,the service of discovery is fine. but pilot-agent doesn't work correctly,listen but can't return data. so ,what happend there?

ps: bookinfo ,all service works fine,in fact. only discovery promble describe here .

adamzhoul avatar Apr 03 '18 11:04 adamzhoul

check the pilot, discovery log get : warn unable to retrieve region label for pod: productpage-v1-777c468697-4c7pq is that the reason ? why

adamzhoul avatar Apr 03 '18 13:04 adamzhoul

do you mean AZ label for pod (vs region) in the log? cc @liamawhite If it is AZ, it is not harmful for istio non-auth install.

linsun avatar Apr 04 '18 19:04 linsun

also, can you check if your product page pod is running fine?

linsun avatar Apr 04 '18 19:04 linsun

Yeah the warning is harmless. Won't have any effect on the routing.

liamawhite avatar Apr 04 '18 20:04 liamawhite

thank you @linsun @liamawhite each pod runs fine , only they can't communicate with the pilot. and, what else should i check?

one more thing, after using tcpflow , i notice istio, use envoy v1 api. but i curl http://pilotip:15003/v1/registration/.. response: Connection was reset.

ps: i encounted this problem twice. first time, i restart the pilot pod, then everything works great. the second time, it suddenly went wrong . instead of restart, i try to figure what happened. but, after several hours. it suddenly works fine.

adamzhoul avatar Apr 07 '18 05:04 adamzhoul

I have same issue. Pilot service looks good.

root@istio-ingress-7b7cdd577b-jqqhd:/# curl http://istio-pilot:15003/v1/registration/"productpage.bookinfo.svc.cluster.local|http"
{
  "hosts": [
   {
    "ip_address": "10.65.83.44",
    "port": 9080
   }
  ]
 }

productpage istio-proxy log shows "lds: fetch failure: network error"

kubectl log po/productpage-v1-d46b58748-c77mt istio-proxy -n bookinfo

[2018-05-02 09:17:55.337][14][info][upstream] external/envoy/source/server/lds_subscription.cc:70] lds: fetch failure: network error
2018-05-02T09:18:16.975099Z	info	Proxy availability zone: 
2018-05-02T09:18:46.977422Z	info	Proxy availability zone: 
[2018-05-02 09:18:50.140][14][info][main] external/envoy/source/server/drain_manager_impl.cc:63] shutting down parent after drain
[2018-05-02 09:19:12.472][14][info][upstream] external/envoy/source/server/lds_subscription.cc:70] lds: fetch failure: network error
2018-05-02T09:19:16.979992Z	info	Proxy availability zone: 
2018-05-02T09:19:46.985010Z	info	Proxy availability zone: 
2018-05-02T09:20:16.990356Z	info	Proxy availability zone: 
[2018-05-02 09:20:23.998][14][info][upstream] external/envoy/source/server/lds_subscription.cc:70] lds: fetch failure: network error
2018-05-02T09:20:47.003079Z	info	Proxy availability zone: 
2018-05-02T09:21:17.015069Z	info	Proxy availability zone: 
2018-05-02T09:21:47.019058Z	info	Proxy availability zone: 
[2018-05-02 09:22:03.265][14][info][upstream] external/envoy/source/server/lds_subscription.cc:70] lds: fetch failure: network error

BTW, istio is in large cluster with lots of services. Previously, pilot just "connection reset by peer", after scale to 8 instances, start to have "upstream connect error or disconnect/reset before headers" issue.

wangyf2010 avatar May 02 '18 09:05 wangyf2010

@adamzhoul , @wangyf2010 please could you upgrade to 0.8 and suggest if you still see this issue

sakshigoel12 avatar Jun 05 '18 00:06 sakshigoel12

Hello, I encountered the same error in version 1.0, when visiting product page only returns 503 error

bretagne-peiqi avatar Sep 19 '18 08:09 bretagne-peiqi