xorbits icon indicating copy to clipboard operation
xorbits copied to clipboard

BUG: The cuda devices when init another session connecting to a existing cluster do not take effect

Open ChengjieLi28 opened this issue 1 year ago • 0 comments

Note that the issue tracker is NOT the place for general support. For discussions about development, questions about usage, or any general questions, contact us on https://discuss.xorbits.io/. Reproduce: Now multiple GPUs leads to dead lock.

  1. Init a local cluster
xorbits.init()

get the endpoint in the console log output

  1. init another session in another console process
xorbits.init(endpoint=<endpoint above>, cuda_devices=[0])

Then submit task to this newly init session, dead lock would happen. Therefore, the [0] cuda devices do not take effect.

ChengjieLi28 avatar Apr 13 '23 06:04 ChengjieLi28