distributed
                                
                                 distributed copied to clipboard
                                
                                    distributed copied to clipboard
                            
                            
                            
                        Putting `dask-worker-space` in `/tmp` is unfriendly on shared systems
Since d59500ea97c02753eac9d42951d9c4a5d4f17685 (#6658), launching workers can fail on shared systems if someone else happens to be running at the same time (since they will have created /tmp/dask-worker-space and we won't have write permissions).
A friendlier approach would probably be to use tempfile.mkdtemp to create the directory. This has the disadvantage that if a cluster fails and doesn't clean up, a new run would not clean out the old lock directory.
It is worth noting that most Linux filesystems provide /tmp via tmpfs, which effectively provides a RAM disk. As a result data that is spilled from memory to tmpfs doesn't actually leave memory. This could cause a lot of churn by Workers trying to spill and free memory without having a meaningful impact on memory pressure.
@crusaderky any thoughts here? Pinging you as it looks like you pushed https://github.com/dask/distributed/pull/6658 over the finish line
If this causes problems, we can also revert https://github.com/dask/distributed/pull/6658
I still believe CWD is not a good choice for these things but it might be the lesser evil
Yeah was trying to think of what a better default would be, but agree it is tricky. Sometimes the home directory is shared via NFS, which makes it a bad scratch space location.
Some of this is also a documentation issue ( https://github.com/dask/distributed/issues/4000 ). Currently we have this, but it shows how to configure a single worker. Whereas users likely benefit more from knowing about setting "temporary-directory" in dask.config or using the DASK_TEMPORARY_DIRECTORY environment variable to set this as part of shell or container startup. Maybe these options could be listed here?
Yes, I think the OP about permissions in a shared FS could be handled by better documentation. I think the shared system is a sufficiently rare case that we could point to the documentation and ask users to configure temporary-directory in their dask/distributed.yaml.
What worries me is your comment about the memory mapping since this obviously defeats the purpose of spilling and something like this should not be the default.
Is there a pythonic way to infer whether or not a directory is memory mapped? If so we could try to use /tmp first and fall back to cwd if that's the case.
launching workers can fail on shared systems if someone else happens to be running at the same time (since they will have created
/tmp/dask-worker-spaceand we won't have write permissions).
Right; if writing to /tmp it should be /tmp/dask-worker-space-$USER.
A friendlier approach would probably be to use tempfile.mkdtemp to create the directory.
We're already doing that for the directories inside dask-worker-space.
It is worth noting that most Linux filesystems provide
/tmpviatmpfs, which effectively provides a RAM disk.
Not quite - it's a ramdisk backed by the swap file.
Sometimes the home directory is shared via NFS
Or on cloud services it can be an EBS or equivalent. NFS also causes a lot of problems with locks - the test suite used to fail a lot before #6658 if your workspace was on NFS.
I still believe CWD is not a good choice for these things but it might be the lesser evil
I don't think we should revert #6658, due to the test suite. If we do want to go back to CWD, we should
- leave Nanny and Worker as they are (default to /tmp)
- change dask-workerto default to CWD, e.g. always pass thelocal_directoryparameter
- change all tests that start dask-worker to temporarily move CWD to /tmp through a fixture
I don't think we should revert https://github.com/dask/distributed/pull/6658, due to the test suite. If we do want to go back to CWD, we should
leave Nanny and Worker as they are (default to /tmp) change dask-worker to default to CWD, e.g. always pass the local_directory parameter change all tests that start dask-worker to temporarily move CWD to /tmp through a fixture
I like this proposal but I don't know how this impacts the various deployment tools cc @jacobtomlinson
It is worth noting that most Linux filesystems provide /tmp via tmpfs, which effectively provides a RAM disk.
Not quite - it's a ramdisk backed by the swap file.
That means it would actually swap to disk if memory pressure is high? That would be fine, I guess.
It is worth noting that most Linux filesystems provide /tmp via tmpfs, which effectively provides a RAM disk.
Not quite - it's a ramdisk backed by the swap file.
That means it would actually swap to disk if memory pressure is high? That would be fine, I guess.
Assuming there is a swap file configured.
Right; if writing to /tmp it should be
/tmp/dask-worker-space-$USER.
This seems like a reasonable default (even in the single-user case) which would avoid the initial problem certainly. What are the downsides compared to sticking with dask-worker-space?
Not quite - it's a ramdisk backed by the swap file.
Assuming there is a swap file configured.
If you mounted tmpfs on /tmp but didn't mount a swap partition, your OS is poorly setup and it is not an application problem.
leave Nanny and Worker as they are (default to /tmp) change dask-worker to default to CWD, e.g. always pass the local_directory parameter
@crusaderky I'm not excited about having different defaults here. Some deployment tools use dask-worker and some use dask-spec which invokes the Nanny, so I wouldn't feel great about the inconsistency.
/tmp/dask-worker-space-$USER
@crusaderky This seems reasonable, although it may break down on containerized systems where $USER resolves to the same thing like root or jovyan for everyone.
I think the shared system is a sufficiently rare case
@fjetter Is it? This seems very common on HPC.
If you mounted tmpfs on /tmp but didn't mount a swap partition, your OS is poorly setup and it is not an application problem.
While a niche use case for dask (I suspect), this is unfortunately often the default in HPC systems where you don't have control over the OS setup.
@crusaderky This seems reasonable, although it may break down on containerized systems where
$USERresolves to the same thing likerootorjovyanfor everyone.
If it's inside the container things should be fine, no?
While a niche use case for dask (I suspect), this is unfortunately often the default in HPC systems where you don't have control over the OS setup.
I wouldn't call HPC a niche Dask case.
If it's inside the container things should be fine, no?
Often /tmp is mounted in.
FWIW, the user survey from last year shows that the 2nd (1st is ssh) most popular way of launching dask is with HPC tooling:
- https://blog.dask.org/images/2021_survey/2021_survey_31_0.png
- https://blog.dask.org/2021/09/15/user-survey
@jrbourbeau just opened https://github.com/dask/community/issues/269 to get stats for this year
If it's inside the container things should be fine, no?
Often
/tmpis mounted in.
Holy loss of segregation, Batman! Jokes aside, won't all the containers share the same user on the host VM?
It depends on the HPC, if it is using singularity or something similar the usernames will be correct (and therefore unique). But if the container runtime does user namespacing like Docker does then all users will have the same username.
So if they all have the same username, and the same /tmp then /tmp/dask-worker-space-$USER won't help.
/tmp/dask-worker-space-$UUID then.
Back to tmpfs vs. cwd: I'm very concerned about jupyter notebooks stored on NFS. Such notebooks will spill to NFS when they start a LocalCluster. I think that for the use case of the devbox tmpfs is a better default than cwd. On the flip side, I think it's reasonable to ask users that are doing production deployments to think their parameters through.
With user namespacing the UUID will also be the same for all users so I'm afraid that doesn't help.
I agree with the concerns about it being CWD, but the concerns around tmpfs are also valid. Neither is a particularly good option for certain (large) groups of our users. I commonly see HPC users set this option to somewhere like /scratch, I rarely see cloud/kubernetes users set this, so maybe /tmp and better documentation is the lesser of two evils here.
An alternative would be to always warn if the user doesn't explicitly set this and try and encourage this to always be set. Or maybe just warn if there are existing dask worker space folders in the default location?
With user namespacing the
UUIDwill also be the same for all users so I'm afraid that doesn't help.
I meant uuid.uuid4(), not $UID. Sorry for the confusion.
@fjetter Is it? This seems very common on HPC.
Could probably be caught by fixing this once in dask-jobque. I doubt that there are many HPC users with a custom solution. I was mostly thinking about users that just spin up a cluster using the default settings, i.e. without any configuration or any wrapper, e.g. LocalCluster on a big VM or a laptop, etc.
Just to be clear, I don't have a strong preference here. I introduced this change because I was annoyed about CWD but wasn't aware of any other implications. I'm cool with either solution
HPC users commonly use dask-jobqueue, dask-ssh, dask-mpi and dask-gateway. So if we wanted to handle it downstream those would be the places to do it.