dask-cuda icon indicating copy to clipboard operation
dask-cuda copied to clipboard

Error when mapping to CuPy-backed Dask array with JIT-Unspill

Open pentschev opened this issue 3 years ago • 9 comments

When trying to map a NumPy-backed Dask array to CuPy blocks with JIT-Unspill enabled, it errors as below.

Traceback
<ProxyManager dev_limit=25.40 GiB host_limit=125.97 GiB disk=0 B(0) host=0 B(0) dev=0 B(0)>: Empty
traceback:
  File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/site-packages/distributed/threadpoolexecutor.py", line 55, in _worker
    task.run()
  File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/site-packages/distributed/_concurrent_futures_thread.py", line 65, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/site-packages/distributed/worker.py", line 4442, in apply_function
    msg = apply_function_simple(function, args, kwargs, time_delay)
  File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/site-packages/distributed/worker.py", line 4464, in apply_function_simple
    result = function(*args, **kwargs)
  File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/site-packages/distributed/worker.py", line 4336, in execute_task
    return func(*map(execute_task, args))
  File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/site-packages/dask/optimization.py", line 969, in __call__
    return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
  File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/site-packages/dask/core.py", line 149, in get
    result = _execute_task(task, cache)
  File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/site-packages/cupy/_creation/from_data.py", line 76, in asarray
    return _core.array(a, dtype, False, order)
  File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/site-packages/rmm/rmm.py", line 212, in rmm_cupy_allocator
    buf = librmm.device_buffer.DeviceBuffer(size=nbytes, stream=stream)
  File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/site-packages/dask_cuda/proxify_host_file.py", line 559, in oom
    traceback.print_stack(file=f)


distributed.worker - WARNING - Compute Failed
Function:  execute_task
args:      ((subgraph_callable-41122d27-006d-4784-abe7-148a8d59c5da, (<function _apply_random at 0x7ff318fd4550>, None, 'random_sample', array([4200854173, 2903766801,  493097698,  579393812, 1996154861,
       2319142642, 1854182052, 1305741790, 1726232455, 3859529310,
       2958676691,  168906341, 3847161344, 4060334578,  629460638,
       3465588087, 1710504168, 1575007080,  854227044, 3117465567,
        214958162, 1301429029, 3985615735, 3926874796, 2617439887,
        800027300, 1550251151,  983700229, 3564856360, 4246398641,
       4003869186, 3502762310, 1610002927, 4201471845, 3882268741,
       2326640944,  733798247, 2968509457, 3100625869, 1444432872,
       1768645868,  632547127, 3415977592, 4105943795,  891478217,
        509502321, 1098695002, 2210354887, 1208654412, 3506784282,
       2582711991, 1095240255, 3630514321, 1113160836,   68591545,
       2070028860,  250392651, 4120042760, 1087276075, 2650110448,
       1086754855,   66376344, 3027431323, 2833840905, 3935804296,

kwargs:    {}
Exception: "MemoryError('std::bad_alloc: out_of_memory: RMM failure at:/datasets/pentschev/miniconda3/envs/sgkit/include/rmm/mr/device/pool_memory_resource.hpp:183: Maximum pool size exceeded')"
Reproducer
import cupy as cp
import dask.array as da
import rmm

from distributed import Client, wait
from dask_cuda import LocalCUDACluster


if __name__ == "__main__":
    cluster = LocalCUDACluster(rmm_pool_size="30GB", jit_unspill=True)
    client = Client(cluster)
    client.run(cp.cuda.set_allocator, rmm.rmm_cupy_allocator)

    a = da.random.random((1000000, 100000), chunks=(10000, 1000))
    d_a = da.asarray(a).map_blocks(cp.asarray).persist()
    wait(d_a)

Earlier today @madsbk and I were taking a look at this problem, and it seems like the array is not being registered correctly in the spilling mechanism, as hinted by the ProxyManager debug line printed at the beginning of the stack above.

pentschev avatar Jan 31 '22 12:01 pentschev

CuPy arrays are ignored by jit-unspill by default, see https://github.com/rapidsai/dask-cuda/pull/568#issuecomment-824730557:

Out of curiosity, why can't cupy.ndarray be proxified just like other objects? I must have missed the explanation for that elsewhere.

It is because some functions such as cupy.dot() are written in Cython where they cast the input to a cupy.ndarray. I am working on a solution to overcome this issue.

To enable spilling of cupy arrays set:

export DASK_JIT_UNSPILL_IGNORE="" 

Let's keep this issue open until jit-unspill are able to spill CuPy arrays.

madsbk avatar Jan 31 '22 14:01 madsbk

With DASK_JIT_UNSPILL_IGNORE="" the reproducer indeed completes. However, it's much slower than the default spilling. I modified the example above with the following changes:

--- dask-cuda-840.py.orig       2022-01-31 09:11:48.665748000 -0800
+++ dask-cuda-840.py    2022-01-31 09:12:30.498319000 -0800
@@ -7,10 +7,10 @@


 if __name__ == "__main__":
-    cluster = LocalCUDACluster(rmm_pool_size="30GB", jit_unspill=True)
+    cluster = LocalCUDACluster(CUDA_VISIBLE_DEVICES=[0], rmm_pool_size="30GB", memory_limit=0, jit_unspill=True)
     client = Client(cluster)
     client.run(cp.cuda.set_allocator, rmm.rmm_cupy_allocator)

-    a = da.random.random((1000000, 100000), chunks=(10000, 1000))
+    a = da.random.random((100000, 100000), chunks=(10000, 10000))
     d_a = da.asarray(a).map_blocks(cp.asarray).persist()
     wait(d_a)

With the change above, default spilling completes in 114 seconds, whereas JIT-Unspill takes 190 seconds. Another more complex workflow I'm working on takes 15 seconds to complete with default spilling and 89 seconds with JIT-Unspill. Definitely there's some work that needs to be done w.r.t. CuPy arrays and JIT-Unspill.

pentschev avatar Jan 31 '22 17:01 pentschev

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Mar 02 '22 18:03 github-actions[bot]

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

github-actions[bot] avatar May 31 '22 19:05 github-actions[bot]

It looks like PR ( https://github.com/rapidsai/dask-cuda/pull/843 ) fixed some issues. Are there still remaining things to do here?

jakirkham avatar May 31 '22 23:05 jakirkham

The error is indeed fixed, but we are still slow as per https://github.com/rapidsai/dask-cuda/issues/840#issuecomment-1026013073 . I have some more thorough description in https://github.com/rapidsai/dask-cuda/pull/853#pullrequestreview-881590671 and @madsbk also suggested doing further testing in https://github.com/rapidsai/dask-cuda/pull/853#issuecomment-1047018981 , but I have not had the chance yet to do that. I would prefer to keep this open until we can resolve any potential performance issues.

pentschev avatar Jun 01 '22 08:06 pentschev

Sounds good. Thanks for summarizing the status as well 🙏

jakirkham avatar Jun 01 '22 08:06 jakirkham

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Jul 01 '22 09:07 github-actions[bot]

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

github-actions[bot] avatar Sep 29 '22 09:09 github-actions[bot]