clipper icon indicating copy to clipboard operation
clipper copied to clipboard

Failed to establish a new connection: [Errno 60] Operation timed out on kubernetes

Open YogeshSomawar opened this issue 6 years ago • 3 comments

I am trying to deploy clipper on kubernetes I am using kubernetes from native google cloud.

>>> from clipper_admin import ClipperConnection,KubernetesContainerManager
>>> clipper_conn = ClipperConnection(KubernetesContainerManager(useInternalIP=True))
>>> clipper_conn.start_clipper()
18-07-26:12:52:46 WARNING  [kubernetes_container_manager.py:231] No external node addresses found. Using Internal IP address
18-07-26:12:52:46 INFO     [kubernetes_container_manager.py:244] Found 1 nodes: 10.128.0.4
18-07-26:12:52:47 INFO     [kubernetes_container_manager.py:253] Setting Clipper mgmt port to 31340
18-07-26:12:52:47 INFO     [kubernetes_container_manager.py:261] Setting Clipper query port to 31974
18-07-26:12:52:48 INFO     [kubernetes_container_manager.py:276] Setting Clipper metric port to 31720
18-07-26:12:52:53 INFO     [clipper_admin.py:124] Clipper still initializing.
18-07-26:12:52:59 INFO     [clipper_admin.py:124] Clipper still initializing.
18-07-26:12:53:05 INFO     [clipper_admin.py:124] Clipper still initializing.
18-07-26:12:53:11 INFO     [clipper_admin.py:124] Clipper still initializing.
18-07-26:12:53:17 INFO     [clipper_admin.py:124] Clipper still initializing.
18-07-26:12:53:23 INFO     [clipper_admin.py:124] Clipper still initializing.
18-07-26:12:53:29 INFO     [clipper_admin.py:124] Clipper still initializing.

So as per #537 , I have used connect() for connecting to clipper, but on registering application it is getting timeout,

>>> clipper_conn.connect()
18-07-26:13:06:09 WARNING  [kubernetes_container_manager.py:231] No external node addresses found. Using Internal IP address
18-07-26:13:06:09 INFO     [kubernetes_container_manager.py:244] Found 1 nodes: 10.128.0.4
18-07-26:13:06:10 INFO     [kubernetes_container_manager.py:253] Setting Clipper mgmt port to 31340
18-07-26:13:06:10 INFO     [kubernetes_container_manager.py:261] Setting Clipper query port to 31974
18-07-26:13:06:11 INFO     [kubernetes_container_manager.py:276] Setting Clipper metric port to 31720
18-07-26:13:06:11 INFO     [clipper_admin.py:138] Successfully connected to Clipper cluster at 10.128.0.4:31974
>>>
>>>
>>> clipper_conn.register_application(name="hello-world", input_type="doubles", default_output="-1.0", slo_micros=100000)
Traceback (most recent call last):
  File "/Users/user/Project/pythonenv/lib/python3.6/site-packages/urllib3/connection.py", line 171, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw)
  File "/Users/user/Project/pythonenv/lib/python3.6/site-packages/urllib3/util/connection.py", line 79, in create_connection
    raise err
  File "/Users/user/Project/pythonenv/lib/python3.6/site-packages/urllib3/util/connection.py", line 69, in create_connection
    sock.connect(sa)
TimeoutError: [Errno 60] Operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/user/Project/pythonenv/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/Users/user/Project/pythonenv/lib/python3.6/site-packages/urllib3/connectionpool.py", line 354, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1239, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1285, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1234, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1026, in _send_output
    self.send(msg)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 964, in send
    self.connect()
  File "/Users/user/Project/pythonenv/lib/python3.6/site-packages/urllib3/connection.py", line 196, in connect
    conn = self._new_conn()
  File "/Users/user/Project/pythonenv/lib/python3.6/site-packages/urllib3/connection.py", line 180, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x1139c3c50>: Failed to establish a new connection: [Errno 60] Operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/user/Project/pythonenv/lib/python3.6/site-packages/requests/adapters.py", line 445, in send
    timeout=timeout
  File "/Users/user/Project/pythonenv/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/Users/user/Project/pythonenv/lib/python3.6/site-packages/urllib3/util/retry.py", line 398, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='10.128.0.4', port=31340): Max retries exceeded with url: /admin/add_app (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1139c3c50>: Failed to establish a new connection: [Errno 60] Operation timed out',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/user/Project/pythonenv/lib/python3.6/site-packages/clipper_admin/clipper_admin.py", line 192, in register_application
    r = requests.post(url, headers=headers, data=req_json)
  File "/Users/user/Project/pythonenv/lib/python3.6/site-packages/requests/api.py", line 112, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/Users/user/Project/pythonenv/lib/python3.6/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Users/user/Project/pythonenv/lib/python3.6/site-packages/requests/sessions.py", line 512, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/user/Project/pythonenv/lib/python3.6/site-packages/requests/sessions.py", line 622, in send
    r = adapter.send(request, **kwargs)
  File "/Users/user/Project/pythonenv/lib/python3.6/site-packages/requests/adapters.py", line 513, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='10.128.0.4', port=31340): Max retries exceeded with url: /admin/add_app (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1139c3c50>: Failed to establish a new connection: [Errno 60] Operation timed out',))
>>>

PODs got created on k8s :

Pro:~$  kubectl get pods -o wide
NAME                                   READY     STATUS    RESTARTS   AGE       IP          NODE
argo-ui-9bfc9f5c-9ql7b                 1/1       Running   0          1d        10.8.0.12   gke-vodafone-poc-default-pool-98b76b13-bfnm
metrics-7fcdf94575-94ccd               1/1       Running   0          27m       10.8.0.20   gke-vodafone-poc-default-pool-98b76b13-bfnm
mgmt-frontend-7f997b4cd5-5f7kt         1/1       Running   0          27m       10.8.2.18   gke-vodafone-poc-default-pool-98b76b13-pg3j
query-frontend-0-67c6d46495-dd798      2/2       Running   0          27m       10.8.0.19   gke-vodafone-poc-default-pool-98b76b13-bfnm
redis-fd8df4f7d-x9rdg                  1/1       Running   0          27m       10.8.1.16   gke-vodafone-poc-default-pool-98b76b13-mm2f
web-587b7fd7b4-dnmdd                   1/1       Running   0          1d        10.8.0.13   gke-vodafone-poc-default-pool-98b76b13-bfnm
workflow-controller-84d54f597d-shpcf   1/1       Running   0          1d        10.8.1.10   gke-vodafone-poc-default-pool-98b76b13-mm2f
Pro:~$
Pro:~$
Pro:~$ kubectl get service -o wide
NAME               CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE       SELECTOR
argo-ui            10.11.252.17    <nodes>       80:30712/TCP     1d        app=argo-ui
kubernetes         10.11.240.1     <none>        443/TCP          1d        <none>
metrics            10.11.251.234   <nodes>       9090:31720/TCP   27m       ai.clipper.name=metrics
mgmt-frontend      10.11.240.8     <nodes>       1338:31340/TCP   27m       ai.clipper.name=mgmt-frontend
query-frontend     10.11.248.88    <nodes>       1337:31974/TCP   27m       ai.clipper.name=query-frontend
query-frontend-0   10.11.252.48    <nodes>       7000:32598/TCP   27m       ai.clipper.name=query-frontend,ai.clipper.query_frontend.id=0
redis              10.11.243.186   <nodes>       6379:30959/TCP   27m       ai.clipper.name=redis
web                10.11.246.114   <nodes>       8080:32092/TCP   1d        run=web
Pro:~$

Could somebody assist me on this.

YogeshSomawar avatar Jul 26 '18 07:07 YogeshSomawar

Hi, Looks like a problem with your frontends. Try to run clipper_conn.stop_all() and then

from clipper_admin import ClipperConnection,KubernetesContainerManager clipper_conn = ClipperConnection(KubernetesContainerManager(useInternalIP=True) clipper_conn.start_clipper()

A new cluster should be deployed.

agneeshdg avatar Jul 31 '18 10:07 agneeshdg

Hi @agneeshdg, Thanks for your comments, however, the problem did not resolve. As I am using this on K8S, does I need to use specific POD IP to establish the connection?

~Yogesh

YogeshSomawar avatar Aug 07 '18 10:08 YogeshSomawar

@simon-mo @YogeshSomawar @agneeshdg I also tried to run on aws ec2 k8s but it failed on health check, have you guys had similiar problems? image

wcwang07 avatar Feb 15 '19 02:02 wcwang07