ray icon indicating copy to clipboard operation
ray copied to clipboard

Improve Ray client error message when exception can't be unpickled

Open pcmoritz opened this issue 1 year ago • 1 comments

Why are these changes needed?

This can for example happen if there is a version mismatch between libraries used. We should still try to surface the error message in these cases.

Related issue number

Checks

  • [ ] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • [ ] I've run scripts/format.sh to lint the changes in this PR.
  • [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
  • [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • [ ] Unit tests
    • [ ] Release tests
    • [ ] This PR is not tested :(

pcmoritz avatar Sep 22 '22 00:09 pcmoritz

This significantly improves the error message for me if there is a version conflict:

Failed to unpickle serialized exception -- original error message is: ray.exceptions.RayTaskError: ray::begin_task_run() (pid=589, ip=172.31.76.169)
  File "/Users/pcmoritz/anaconda3/envs/py38/lib/python3.8/site-packages/prefect/utilities/asyncutils.py", line 212, in wrapper
  File "/tmp/ray/session_2022-09-21_17-29-11_860271_150/runtime_resources/pip/78966d9d98feb88feb10c67481827dce43697301/virtualenv/lib/python3.8/site-packages/prefect/utilities/asyncutils.py", line 141, in run_async_in_new_loop
    return anyio.run(partial(__fn, *args, **kwargs))
  File "/home/ray/anaconda3/lib/python3.8/site-packages/anyio/_core/_eventloop.py", line 70, in run
    return asynclib.run(func, *args, **backend_options)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 292, in run
    return native_run(wrapper(), debug=debug)
  File "/home/ray/anaconda3/lib/python3.8/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/home/ray/anaconda3/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/home/ray/anaconda3/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 287, in wrapper
    return await func(*args)
  File "/tmp/ray/session_2022-09-21_17-29-11_860271_150/runtime_resources/pip/78966d9d98feb88feb10c67481827dce43697301/virtualenv/lib/python3.8/site-packages/prefect/engine.py", line 1103, in begin_task_run
    return await orchestrate_task_run(
  File "/tmp/ray/session_2022-09-21_17-29-11_860271_150/runtime_resources/pip/78966d9d98feb88feb10c67481827dce43697301/virtualenv/lib/python3.8/site-packages/prefect/engine.py", line 1186, in orchestrate_task_run
    state = await propose_state(
  File "/tmp/ray/session_2022-09-21_17-29-11_860271_150/runtime_resources/pip/78966d9d98feb88feb10c67481827dce43697301/virtualenv/lib/python3.8/site-packages/prefect/engine.py", line 1463, in propose_state
    response = await client.set_task_run_state(
  File "/tmp/ray/session_2022-09-21_17-29-11_860271_150/runtime_resources/pip/78966d9d98feb88feb10c67481827dce43697301/virtualenv/lib/python3.8/site-packages/prefect/client.py", line 1826, in set_task_run_state
    response = await self._client.post(
  File "/tmp/ray/session_2022-09-21_17-29-11_860271_150/runtime_resources/pip/78966d9d98feb88feb10c67481827dce43697301/virtualenv/lib/python3.8/site-packages/httpx/_client.py", line 1842, in post
    return await self.request(
  File "/tmp/ray/session_2022-09-21_17-29-11_860271_150/runtime_resources/pip/78966d9d98feb88feb10c67481827dce43697301/virtualenv/lib/python3.8/site-packages/httpx/_client.py", line 1527, in request
    return await self.send(request, auth=auth, follow_redirects=follow_redirects)
  File "/tmp/ray/session_2022-09-21_17-29-11_860271_150/runtime_resources/pip/78966d9d98feb88feb10c67481827dce43697301/virtualenv/lib/python3.8/site-packages/prefect/client.py", line 279, in send
    response.raise_for_status()
  File "/tmp/ray/session_2022-09-21_17-29-11_860271_150/runtime_resources/pip/78966d9d98feb88feb10c67481827dce43697301/virtualenv/lib/python3.8/site-packages/prefect/client.py", line 225, in raise_for_status
    raise PrefectHTTPStatusError.from_httpx_error(exc) from exc.__cause__
prefect.exceptions.PrefectHTTPStatusError: Client error '404 Not Found' for url 'http://ephemeral-orion/api/task_runs/cbf4695a-1608-465c-8692-f88099d6383d/set_state'
Response: {'exception_message': 'Task run with id cbf4695a-1608-465c-8692-f88099d6383d not found'}
For more information check: https://httpstatuses.com/404

17:30:16.623 | INFO    | Task run 'wait-5a2aff11-0' - Crash detected! Execution was interrupted by an unexpected exception.
17:30:18.584 | INFO    | Flow run 'aboriginal-ringtail' - Finished in state Completed()

pcmoritz avatar Sep 22 '22 00:09 pcmoritz

Even though it looks like it might be related, the tests are actually flaky on master :)

pcmoritz avatar Sep 27 '22 01:09 pcmoritz