mars
mars copied to clipboard
[BUG] Ray DAG mode access Mars WEB Dashboard error
Describe the bug
When access task status for ray DAG mode in mars dashboard, got incorrect task status. Following task is finished, the graph should be green instead of blank:
To Reproduce To help us reproducing this bug, please provide information below:
- Your Python version: 3.8
- The version of Mars you use: https://github.com/mars-project/mars/pull/3165
- Versions of crucial packages, such as numpy, scipy and pandas: pandas 1.4.2, numpy 1.19.5, scipy 1.8.1
- Full stack of the error.
(RayMainPool pid=54669) 2022-06-27 16:06:15,283 ERROR web.py:2239 -- 500 GET /api/session/SFJWHJesbcMFjANMcqTskZ6R/task/iQvlLWn2zC9z63HSWWbw5maQ/tileable_detail (127.0.0.1) 5.22ms
^C(RayMainPool pid=54669) 2022-06-27 16:06:16,283 ERROR core.py:82 -- TypeError when handling request with TaskWebAPIHandler.get_tileable_details
(RayMainPool pid=54669) Traceback (most recent call last):
(RayMainPool pid=54669) File "/Users/chaokunyang/Desktop/chaokun/python/mars/mars/services/web/core.py", line 70, in wrapped
(RayMainPool pid=54669) res = await self._create_or_get_url_future(
(RayMainPool pid=54669) File "/Users/chaokunyang/Desktop/chaokun/python/mars/mars/services/task/api/web.py", line 132, in get_tileable_details
(RayMainPool pid=54669) res = await oscar_api.get_tileable_details(task_id)
(RayMainPool pid=54669) File "/Users/chaokunyang/Desktop/chaokun/python/mars/mars/services/task/api/oscar.py", line 77, in get_tileable_details
(RayMainPool pid=54669) return await self._task_manager_ref.get_tileable_details(task_id)
(RayMainPool pid=54669) File "mars/oscar/core.pyx", line 263, in __pyx_actor_method_wrapper
(RayMainPool pid=54669) async with lock:
(RayMainPool pid=54669) File "mars/oscar/core.pyx", line 266, in mars.oscar.core.__pyx_actor_method_wrapper
(RayMainPool pid=54669) result = await result
(RayMainPool pid=54669) File "/Users/chaokunyang/Desktop/chaokun/python/mars/mars/services/task/supervisor/manager.py", line 206, in get_tileable_details
(RayMainPool pid=54669) return await processor_ref.get_tileable_details()
(RayMainPool pid=54669) File "/Users/chaokunyang/Desktop/chaokun/python/mars/mars/oscar/backends/context.py", line 196, in send
(RayMainPool pid=54669) return self._process_result_message(result)
(RayMainPool pid=54669) File "/Users/chaokunyang/Desktop/chaokun/python/mars/mars/oscar/backends/context.py", line 76, in _process_result_message
(RayMainPool pid=54669) raise message.as_instanceof_cause()
(RayMainPool pid=54669) File "/Users/chaokunyang/Desktop/chaokun/python/mars/mars/oscar/backends/pool.py", line 586, in send
(RayMainPool pid=54669) result = await self._run_coro(message.message_id, coro)
(RayMainPool pid=54669) File "/Users/chaokunyang/Desktop/chaokun/python/mars/mars/oscar/backends/pool.py", line 343, in _run_coro
(RayMainPool pid=54669) return await coro
(RayMainPool pid=54669) File "/Users/chaokunyang/Desktop/chaokun/python/mars/mars/oscar/api.py", line 120, in __on_receive__
(RayMainPool pid=54669) return await super().__on_receive__(message)
(RayMainPool pid=54669) File "mars/oscar/core.pyx", line 523, in __on_receive__
(RayMainPool pid=54669) raise ex
(RayMainPool pid=54669) File "mars/oscar/core.pyx", line 516, in mars.oscar.core._BaseActor.__on_receive__
(RayMainPool pid=54669) return await self._handle_actor_result(result)
(RayMainPool pid=54669) File "mars/oscar/core.pyx", line 401, in _handle_actor_result
(RayMainPool pid=54669) task_result = await coros[0]
(RayMainPool pid=54669) File "mars/oscar/core.pyx", line 444, in mars.oscar.core._BaseActor._run_actor_async_generator
(RayMainPool pid=54669) async with self._lock:
(RayMainPool pid=54669) File "mars/oscar/core.pyx", line 445, in mars.oscar.core._BaseActor._run_actor_async_generator
(RayMainPool pid=54669) with debug_async_timeout('actor_lock_timeout',
(RayMainPool pid=54669) File "mars/oscar/core.pyx", line 450, in mars.oscar.core._BaseActor._run_actor_async_generator
(RayMainPool pid=54669) res = await gen.athrow(*res)
(RayMainPool pid=54669) File "/Users/chaokunyang/Desktop/chaokun/python/mars/mars/services/task/supervisor/task.py", line 159, in get_tileable_details
(RayMainPool pid=54669) tileable_to_details = yield asyncio.to_thread(self._get_tileable_infos)
(RayMainPool pid=54669) File "mars/oscar/core.pyx", line 455, in mars.oscar.core._BaseActor._run_actor_async_generator
(RayMainPool pid=54669) res = await self._handle_actor_result(res)
(RayMainPool pid=54669) File "mars/oscar/core.pyx", line 375, in _handle_actor_result
(RayMainPool pid=54669) result = await result
(RayMainPool pid=54669) File "/Users/chaokunyang/Desktop/chaokun/python/mars/mars/lib/aio/_threads.py", line 36, in to_thread
(RayMainPool pid=54669) return await loop.run_in_executor(None, func_call)
(RayMainPool pid=54669) File "/Users/chaokunyang/anaconda3/envs/mars3.8/lib/python3.8/concurrent/futures/thread.py", line 57, in run
(RayMainPool pid=54669) result = self.fn(*self.args, **self.kwargs)
(RayMainPool pid=54669) File "/Users/chaokunyang/Desktop/chaokun/python/mars/mars/services/task/supervisor/task.py", line 100, in _get_tileable_infos
(RayMainPool pid=54669) subtask_id_to_results = self._get_all_subtask_results()
(RayMainPool pid=54669) File "/Users/chaokunyang/Desktop/chaokun/python/mars/mars/services/task/supervisor/task.py", line 64, in _get_all_subtask_results
(RayMainPool pid=54669) for stage in processor.stage_processors:
(RayMainPool pid=54669) TypeError: [address=ray://ray-cluster-1656317089/0/0, pid=54669] 'NoneType' object is not iterable
(RayMainPool pid=54669) 2022-06-27 16:06:16,289 ERROR web.py:2239 -- 500 GET /api/session/SFJWHJesbcMFjANMcqTskZ6R/task/iQvlLWn2zC9z63HSWWbw5maQ/tileable_detail (127.0.0.1) 8.59ms
- Minimized code to reproduce the error.
pytest -v -s mars/deploy/oscar/tests/test_ray_dag_oscar.py::test_iterative_tiling
:
@require_ray
@pytest.mark.asyncio
async def test_iterative_tiling(ray_start_regular_shared2, create_cluster):
await test_local.test_iterative_tiling(create_cluster)
time.sleep(100000)
Expected behavior A clear and concise description of what you expected to happen.
Additional context Add any other context about the problem here.