ray
ray copied to clipboard
Improve Ray client error message when exception can't be unpickled
Why are these changes needed?
This can for example happen if there is a version mismatch between libraries used. We should still try to surface the error message in these cases.
Related issue number
Checks
- [ ] I've signed off every commit(by using the -s flag, i.e.,
git commit -s) in this PR. - [ ] I've run
scripts/format.shto lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [ ] Unit tests
- [ ] Release tests
- [ ] This PR is not tested :(
This significantly improves the error message for me if there is a version conflict:
Failed to unpickle serialized exception -- original error message is: ray.exceptions.RayTaskError: ray::begin_task_run() (pid=589, ip=172.31.76.169)
File "/Users/pcmoritz/anaconda3/envs/py38/lib/python3.8/site-packages/prefect/utilities/asyncutils.py", line 212, in wrapper
File "/tmp/ray/session_2022-09-21_17-29-11_860271_150/runtime_resources/pip/78966d9d98feb88feb10c67481827dce43697301/virtualenv/lib/python3.8/site-packages/prefect/utilities/asyncutils.py", line 141, in run_async_in_new_loop
return anyio.run(partial(__fn, *args, **kwargs))
File "/home/ray/anaconda3/lib/python3.8/site-packages/anyio/_core/_eventloop.py", line 70, in run
return asynclib.run(func, *args, **backend_options)
File "/home/ray/anaconda3/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 292, in run
return native_run(wrapper(), debug=debug)
File "/home/ray/anaconda3/lib/python3.8/asyncio/runners.py", line 43, in run
return loop.run_until_complete(main)
File "/home/ray/anaconda3/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/home/ray/anaconda3/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 287, in wrapper
return await func(*args)
File "/tmp/ray/session_2022-09-21_17-29-11_860271_150/runtime_resources/pip/78966d9d98feb88feb10c67481827dce43697301/virtualenv/lib/python3.8/site-packages/prefect/engine.py", line 1103, in begin_task_run
return await orchestrate_task_run(
File "/tmp/ray/session_2022-09-21_17-29-11_860271_150/runtime_resources/pip/78966d9d98feb88feb10c67481827dce43697301/virtualenv/lib/python3.8/site-packages/prefect/engine.py", line 1186, in orchestrate_task_run
state = await propose_state(
File "/tmp/ray/session_2022-09-21_17-29-11_860271_150/runtime_resources/pip/78966d9d98feb88feb10c67481827dce43697301/virtualenv/lib/python3.8/site-packages/prefect/engine.py", line 1463, in propose_state
response = await client.set_task_run_state(
File "/tmp/ray/session_2022-09-21_17-29-11_860271_150/runtime_resources/pip/78966d9d98feb88feb10c67481827dce43697301/virtualenv/lib/python3.8/site-packages/prefect/client.py", line 1826, in set_task_run_state
response = await self._client.post(
File "/tmp/ray/session_2022-09-21_17-29-11_860271_150/runtime_resources/pip/78966d9d98feb88feb10c67481827dce43697301/virtualenv/lib/python3.8/site-packages/httpx/_client.py", line 1842, in post
return await self.request(
File "/tmp/ray/session_2022-09-21_17-29-11_860271_150/runtime_resources/pip/78966d9d98feb88feb10c67481827dce43697301/virtualenv/lib/python3.8/site-packages/httpx/_client.py", line 1527, in request
return await self.send(request, auth=auth, follow_redirects=follow_redirects)
File "/tmp/ray/session_2022-09-21_17-29-11_860271_150/runtime_resources/pip/78966d9d98feb88feb10c67481827dce43697301/virtualenv/lib/python3.8/site-packages/prefect/client.py", line 279, in send
response.raise_for_status()
File "/tmp/ray/session_2022-09-21_17-29-11_860271_150/runtime_resources/pip/78966d9d98feb88feb10c67481827dce43697301/virtualenv/lib/python3.8/site-packages/prefect/client.py", line 225, in raise_for_status
raise PrefectHTTPStatusError.from_httpx_error(exc) from exc.__cause__
prefect.exceptions.PrefectHTTPStatusError: Client error '404 Not Found' for url 'http://ephemeral-orion/api/task_runs/cbf4695a-1608-465c-8692-f88099d6383d/set_state'
Response: {'exception_message': 'Task run with id cbf4695a-1608-465c-8692-f88099d6383d not found'}
For more information check: https://httpstatuses.com/404
17:30:16.623 | INFO | Task run 'wait-5a2aff11-0' - Crash detected! Execution was interrupted by an unexpected exception.
17:30:18.584 | INFO | Flow run 'aboriginal-ringtail' - Finished in state Completed()
Even though it looks like it might be related, the tests are actually flaky on master :)