dask-cuda icon indicating copy to clipboard operation
dask-cuda copied to clipboard

Multiple Distributed IO threads causes hang with UCX

Open pentschev opened this issue 3 years ago • 2 comments

In Distributed it's possible, but not common, to have multiple IO threads. Although I would argue this is generally not a supported use case, having for instance multiple Clients on the same process will translate into having multiple IO threads which UCX-Py will be accessed from, and given UCX-Py doesn't support multithreading today, it deadlocks the process.

Another such case which is uncommon and potentially unsupported or at least discouraged is creating a LocalCUDACluster and a Client connecting to that cluster via its IP address rather than its object, for example:

import time

from dask_cuda import LocalCUDACluster
from dask.distributed import Client

from dask_cuda.initialize import initialize


if __name__ == "__main__":
    cluster = LocalCUDACluster(protocol="ucx", interface="lo")

    client = Client('ucx://127.0.0.1:65432')
    for i in range(60):
        print(client)
        time.sleep(1)

I believe once UCX-Py supports multithreading this issue will be resolved automatically, but reporting it here since that was brought to my attention recently.

pentschev avatar Feb 22 '22 16:02 pentschev

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Mar 24 '22 17:03 github-actions[bot]

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

github-actions[bot] avatar Jun 22 '22 17:06 github-actions[bot]