hail icon indicating copy to clipboard operation
hail copied to clipboard

[query] Failures to communicate with the spark/local backend result in cryptic error message

Open daniel-goldstein opened this issue 1 month ago • 0 comments

What happened?

Hail propagates nicely explained error messages from java to python when an exception is thrown in the user's pipeline. However, the hail python front end does not handle a situation where the java backend disappears entirely, which can happen in the case of an OOM killer killing the JVM. The result is an error as seen below. In such a scenario, the python front end should add a useful message suggesting that the backend is not reachable and might have run out of memory.

Version

0.2.130

Relevant log output

File ~/Library/Python/3.9/lib/python/site-packages/hail/table.py:2814, in Table.collect(self, _localize, _timed)
2812 e = construct_expr(rows_ir, hl.tarray(t.row.dtype))
2813 if _localize:
→ 2814 return Env.backend().execute(e._ir, timed=_timed)
2815 else:
2816 return e

File ~/Library/Python/3.9/lib/python/site-packages/hail/backend/backend.py:188, in Backend.execute(self, ir, timed)
186 payload = ExecutePayload(self._render_ir(ir), ‘{“name”:“StreamBufferSpec”}’, timed)
187 try:
→ 188 result, timings = self._rpc(ActionTag.EXECUTE, payload)
189 except FatalError as e:
190 raise e.maybe_user_error(ir) from None

File ~/Library/Python/3.9/lib/python/site-packages/hail/backend/py4j_backend.py:218, in Py4JBackend._rpc(self, action, payload)
216 path = action_routes[action]
217 port = self._backend_server_port
→ 218 resp = self._requests_session.post(f’http://localhost:{port}{path}', data=data)
219 if resp.status_code >= 400:
220 error_json = orjson.loads(resp.content)

File ~/Library/Python/3.9/lib/python/site-packages/requests/sessions.py:637, in Session.post(self, url, data, json, **kwargs)
626 def post(self, url, data=None, json=None, **kwargs):
627 r""“Sends a POST request. Returns :class:Response object.
628
629 :param url: URL for the new :class:Request object.
(…)
634 :rtype: requests.Response
635 “””
→ 637 return self.request(“POST”, url, data=data, json=json, **kwargs)

File ~/Library/Python/3.9/lib/python/site-packages/requests/sessions.py:589, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
584 send_kwargs = {
585 “timeout”: timeout,
586 “allow_redirects”: allow_redirects,
587 }
588 send_kwargs.update(settings)
→ 589 resp = self.send(prep, **send_kwargs)
591 return resp

File ~/Library/Python/3.9/lib/python/site-packages/requests/sessions.py:703, in Session.send(self, request, **kwargs)
700 start = preferred_clock()
702 # Send the request
→ 703 r = adapter.send(request, **kwargs)
705 # Total elapsed time of the request (approximately)
706 elapsed = preferred_clock() - start

File ~/Library/Python/3.9/lib/python/site-packages/requests/adapters.py:501, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
486 resp = conn.urlopen(
487 method=request.method,
488 url=url,
(…)
497 chunked=chunked,
498 )
500 except (ProtocolError, OSError) as err:
→ 501 raise ConnectionError(err, request=request)
503 except MaxRetryError as e:
504 if isinstance(e.reason, ConnectTimeoutError):
505 # TODO: Remove this in 3.0.0: see #2811

ConnectionError: (‘Connection aborted.’, RemoteDisconnected(‘Remote end closed connection without response’))

daniel-goldstein avatar May 22 '24 15:05 daniel-goldstein