dask-cuda icon indicating copy to clipboard operation
dask-cuda copied to clipboard

Extend memory spilling to multiple storage media

Open pentschev opened this issue 6 years ago • 11 comments
trafficstars

Currently in the works of #35, we will have the capability of spilling CUDA device memory to host, and that to disk. However, as pointed out by @kkraus14 here, it would be beneficial to allow spilling host memory to multiple user-defined storage media.

I think we could follow the same configuration structure of Alluxio, as suggested by @kkraus14. Based on the current structure suggested in #35 (still subject to change), it would look something like the following:

cuda.worker.dirs.path=/mnt/nvme,/mnt/ssd,/mnt/nfs cuda.worker.dirs.quota=16GB,100GB,1000GB

@mrocklin FYI

pentschev avatar Apr 21 '19 20:04 pentschev

One related note for tracking, it would be useful to leverage GPUDirect Storage to allow spilling directly from GPU memory to disk.

jakirkham avatar Nov 28 '19 21:11 jakirkham

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

github-actions[bot] avatar Feb 16 '21 19:02 github-actions[bot]

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

github-actions[bot] avatar May 17 '21 19:05 github-actions[bot]

@pentschev could you link documentation which explains how to set up spilling to disk? I found https://github.com/rapidsai/dask-cuda/pull/51/files but there doesn't seem to be any documentation on new feature. I want to use dask_cudf to spill from vmem to main mem, and then from main mem to disk, when main mem is not enough. Searching https://docs.rapids.ai/ doesn't provide any answer.

jangorecki avatar May 27 '21 13:05 jangorecki

In rapidsai/cudf#3740 I linked to: https://docs.rapids.ai/api/dask-cuda/nightly/spilling.html

quasiben avatar May 27 '21 13:05 quasiben

This doc doesn't seem to answer my use case.

jangorecki avatar May 27 '21 14:05 jangorecki

Currently, --device-memory-limit/device_memory_limit (dask-cuda-worker/LocalCUDACluster) will spill from device to host, similarly, --memory-limit/memory_limit spills from host to disk just like in mainline Dask, and the spilled data is stored in --local-directory/local_directory. Spilling to disk today is only supported for the default mechanism, JIT spilling still doesn't support it.

pentschev avatar May 27 '21 15:05 pentschev

@pentschev thank you for reply although it doesn't correspond to my current approach (cu.set_allocator("managed"). AFAIU to use it with dask I should have

client = Client(cluster)
client.run(cu.set_allocator, "managed")

Is this going to handle spilling vmem->mem->disk? I don't want to change default limits of memory, but only enable spilling.

jangorecki avatar May 27 '21 16:05 jangorecki

No, managed memory is handled by the CUDA driver, we have no control over how it handles spilling and it doesn't support any spilling to disk whatsoever. Within Dask, you can enable spilling as I mentioned above, it doesn't make use of managed memory and thus is not as performant, but it will allow Dask to spill Python memory (i.e., Dask array/dataframes chunks), but it also has no control over the memory that's handled internally by libraries such as cuDF.

pentschev avatar May 27 '21 17:05 pentschev

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

github-actions[bot] avatar Nov 23 '21 20:11 github-actions[bot]

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Nov 23 '21 20:11 github-actions[bot]