dea-notebooks
dea-notebooks copied to clipboard
Local Dask cluster can fill Sandbox disks
The current dask configuration will page to disk when it's out of memory, and it saves to a local directory, ./dask-worker-space next to the notebook that's bring run.
As a short-term fix, if you open a terminal and run du -sh ./* you might see one folder with gigabytes of data in it, and if you chase that down and it's a dask-worker-space directory, it can be safely deleted.
But more long-term is configuring Dask to use a temporary location to write it's files, so that they get cleaned up automatically.

Use this command in the terminal to list all dask caches
find . -type d -name dask-worker-space
By default Dask creates dask-worker-space directory next to the notebook that started Dask client.
Relevant configuration is dask.temporary-directory
https://docs.dask.org/en/latest/configuration-reference.html#dask.temporary-directory
But also we should consider tweaking spill to disk behaviour distributed.worker.memory.{target|spill|pause|terminate}
https://docs.dask.org/en/latest/configuration-reference.html#distributed.worker.memory.target https://docs.dask.org/en/latest/configuration-reference.html#distributed.worker.memory.spill https://docs.dask.org/en/latest/configuration-reference.html#distributed.worker.memory.pause https://docs.dask.org/en/latest/configuration-reference.html#distributed.worker.memory.terminate
Not a DEA Notebooks-specific issue, closing this here to raise elsewhere