xorbits icon indicating copy to clipboard operation
xorbits copied to clipboard

BUG: Should catch the decref error when the session has already been destoryed

Open ChengjieLi28 opened this issue 1 year ago • 0 comments

Note that the issue tracker is NOT the place for general support. For discussions about development, questions about usage, or any general questions, contact us on https://discuss.xorbits.io/. Reproduce:

  1. init a cluster
import xorbits

xorbits.init(cuda_devices=[0])
  1. Run a task in another session, then destory session explicitly
import xorbits

xorbits.init('<endpoint above>')

import xorbits.numpy as np

np.random.rand(10000, 10000).to_gpu()

xorbits.shutdown()
  1. Exit the process that you run the task above. Then the cluster (tornado) will raise an error:
2023-04-21 07:58:04,500 xorbits._mars.services.web.core 111327 ERROR    ActorNotExist when handling request with LifecycleWebAPIHandler.decref_tileables
Traceback (most recent call last):
  File "/home/lichengjie/workspace/xorbits/python/xorbits/_mars/services/web/core.py", line 69, in wrapped
    res = await func(self, *args, **kwargs)
  File "/home/lichengjie/workspace/xorbits/python/xorbits/_mars/services/lifecycle/api/web.py", line 39, in decref_tileables
    await oscar_api.decref_tileables(tileable_keys, counts=counts)
  File "/home/lichengjie/workspace/xorbits/python/xorbits/_mars/services/lifecycle/api/oscar.py", line 108, in decref_tileables
    return await self._lifecycle_tracker_ref.decref_tileables(tileable_keys)
  File "xoscar/core.pyx", line 251, in xoscar.core.LocalActorRef.__getattr__
xoscar.errors.ActorNotExist: Actor b'2IC9l6dhZaChiD31uVp7EYKq_lifecycle_tracker' does not exist
2023-04-21 07:58:04,500 tornado.access 111327 ERROR    500 POST /api/session/2IC9l6dhZaChiD31uVp7EYKq/lifecycle?action=decref_tileables (127.0.0.1) 1.18ms

This error should not raise out which may lead to some confusions.

ChengjieLi28 avatar Apr 21 '23 08:04 ChengjieLi28