dask-cuda
dask-cuda copied to clipboard
Error when mapping to CuPy-backed Dask array with JIT-Unspill
When trying to map a NumPy-backed Dask array to CuPy blocks with JIT-Unspill enabled, it errors as below.
Traceback
<ProxyManager dev_limit=25.40 GiB host_limit=125.97 GiB disk=0 B(0) host=0 B(0) dev=0 B(0)>: Empty
traceback:
File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/site-packages/distributed/threadpoolexecutor.py", line 55, in _worker
task.run()
File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/site-packages/distributed/_concurrent_futures_thread.py", line 65, in run
result = self.fn(*self.args, **self.kwargs)
File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/site-packages/distributed/worker.py", line 4442, in apply_function
msg = apply_function_simple(function, args, kwargs, time_delay)
File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/site-packages/distributed/worker.py", line 4464, in apply_function_simple
result = function(*args, **kwargs)
File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/site-packages/distributed/worker.py", line 4336, in execute_task
return func(*map(execute_task, args))
File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/site-packages/dask/optimization.py", line 969, in __call__
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/site-packages/dask/core.py", line 149, in get
result = _execute_task(task, cache)
File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/site-packages/cupy/_creation/from_data.py", line 76, in asarray
return _core.array(a, dtype, False, order)
File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/site-packages/rmm/rmm.py", line 212, in rmm_cupy_allocator
buf = librmm.device_buffer.DeviceBuffer(size=nbytes, stream=stream)
File "/datasets/pentschev/miniconda3/envs/sgkit/lib/python3.8/site-packages/dask_cuda/proxify_host_file.py", line 559, in oom
traceback.print_stack(file=f)
distributed.worker - WARNING - Compute Failed
Function: execute_task
args: ((subgraph_callable-41122d27-006d-4784-abe7-148a8d59c5da, (<function _apply_random at 0x7ff318fd4550>, None, 'random_sample', array([4200854173, 2903766801, 493097698, 579393812, 1996154861,
2319142642, 1854182052, 1305741790, 1726232455, 3859529310,
2958676691, 168906341, 3847161344, 4060334578, 629460638,
3465588087, 1710504168, 1575007080, 854227044, 3117465567,
214958162, 1301429029, 3985615735, 3926874796, 2617439887,
800027300, 1550251151, 983700229, 3564856360, 4246398641,
4003869186, 3502762310, 1610002927, 4201471845, 3882268741,
2326640944, 733798247, 2968509457, 3100625869, 1444432872,
1768645868, 632547127, 3415977592, 4105943795, 891478217,
509502321, 1098695002, 2210354887, 1208654412, 3506784282,
2582711991, 1095240255, 3630514321, 1113160836, 68591545,
2070028860, 250392651, 4120042760, 1087276075, 2650110448,
1086754855, 66376344, 3027431323, 2833840905, 3935804296,
kwargs: {}
Exception: "MemoryError('std::bad_alloc: out_of_memory: RMM failure at:/datasets/pentschev/miniconda3/envs/sgkit/include/rmm/mr/device/pool_memory_resource.hpp:183: Maximum pool size exceeded')"
Reproducer
import cupy as cp
import dask.array as da
import rmm
from distributed import Client, wait
from dask_cuda import LocalCUDACluster
if __name__ == "__main__":
cluster = LocalCUDACluster(rmm_pool_size="30GB", jit_unspill=True)
client = Client(cluster)
client.run(cp.cuda.set_allocator, rmm.rmm_cupy_allocator)
a = da.random.random((1000000, 100000), chunks=(10000, 1000))
d_a = da.asarray(a).map_blocks(cp.asarray).persist()
wait(d_a)
Earlier today @madsbk and I were taking a look at this problem, and it seems like the array is not being registered correctly in the spilling mechanism, as hinted by the ProxyManager debug line printed at the beginning of the stack above.
CuPy arrays are ignored by jit-unspill by default, see https://github.com/rapidsai/dask-cuda/pull/568#issuecomment-824730557:
Out of curiosity, why can't
cupy.ndarraybe proxified just like other objects? I must have missed the explanation for that elsewhere.It is because some functions such as
cupy.dot()are written in Cython where they cast the input to acupy.ndarray. I am working on a solution to overcome this issue.
To enable spilling of cupy arrays set:
export DASK_JIT_UNSPILL_IGNORE=""
Let's keep this issue open until jit-unspill are able to spill CuPy arrays.
With DASK_JIT_UNSPILL_IGNORE="" the reproducer indeed completes. However, it's much slower than the default spilling. I modified the example above with the following changes:
--- dask-cuda-840.py.orig 2022-01-31 09:11:48.665748000 -0800
+++ dask-cuda-840.py 2022-01-31 09:12:30.498319000 -0800
@@ -7,10 +7,10 @@
if __name__ == "__main__":
- cluster = LocalCUDACluster(rmm_pool_size="30GB", jit_unspill=True)
+ cluster = LocalCUDACluster(CUDA_VISIBLE_DEVICES=[0], rmm_pool_size="30GB", memory_limit=0, jit_unspill=True)
client = Client(cluster)
client.run(cp.cuda.set_allocator, rmm.rmm_cupy_allocator)
- a = da.random.random((1000000, 100000), chunks=(10000, 1000))
+ a = da.random.random((100000, 100000), chunks=(10000, 10000))
d_a = da.asarray(a).map_blocks(cp.asarray).persist()
wait(d_a)
With the change above, default spilling completes in 114 seconds, whereas JIT-Unspill takes 190 seconds. Another more complex workflow I'm working on takes 15 seconds to complete with default spilling and 89 seconds with JIT-Unspill. Definitely there's some work that needs to be done w.r.t. CuPy arrays and JIT-Unspill.
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
It looks like PR ( https://github.com/rapidsai/dask-cuda/pull/843 ) fixed some issues. Are there still remaining things to do here?
The error is indeed fixed, but we are still slow as per https://github.com/rapidsai/dask-cuda/issues/840#issuecomment-1026013073 . I have some more thorough description in https://github.com/rapidsai/dask-cuda/pull/853#pullrequestreview-881590671 and @madsbk also suggested doing further testing in https://github.com/rapidsai/dask-cuda/pull/853#issuecomment-1047018981 , but I have not had the chance yet to do that. I would prefer to keep this open until we can resolve any potential performance issues.
Sounds good. Thanks for summarizing the status as well 🙏
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.