dea-notebooks icon indicating copy to clipboard operation
dea-notebooks copied to clipboard

Local Dask cluster can fill Sandbox disks

Open alexgleith opened this issue 4 years ago • 2 comments
trafficstars

The current dask configuration will page to disk when it's out of memory, and it saves to a local directory, ./dask-worker-space next to the notebook that's bring run.

As a short-term fix, if you open a terminal and run du -sh ./* you might see one folder with gigabytes of data in it, and if you chase that down and it's a dask-worker-space directory, it can be safely deleted.

But more long-term is configuring Dask to use a temporary location to write it's files, so that they get cleaned up automatically.

image

alexgleith avatar Nov 24 '20 21:11 alexgleith

Use this command in the terminal to list all dask caches

find . -type d -name dask-worker-space

By default Dask creates dask-worker-space directory next to the notebook that started Dask client.

Kirill888 avatar Nov 24 '20 22:11 Kirill888

Relevant configuration is dask.temporary-directory

https://docs.dask.org/en/latest/configuration-reference.html#dask.temporary-directory

But also we should consider tweaking spill to disk behaviour distributed.worker.memory.{target|spill|pause|terminate}

https://docs.dask.org/en/latest/configuration-reference.html#distributed.worker.memory.target https://docs.dask.org/en/latest/configuration-reference.html#distributed.worker.memory.spill https://docs.dask.org/en/latest/configuration-reference.html#distributed.worker.memory.pause https://docs.dask.org/en/latest/configuration-reference.html#distributed.worker.memory.terminate

Kirill888 avatar Nov 24 '20 22:11 Kirill888

Not a DEA Notebooks-specific issue, closing this here to raise elsewhere

robbibt avatar Jun 30 '23 06:06 robbibt