miniwdl
miniwdl copied to clipboard
option for task /tmp mount
If the task command needs to use scratch disk space intensively (to create big files not meant to be output), it's suboptimal to do so either (i) in the initial working directory, which could be a shared filesystem mount, or (ii) inside the Docker container writable layer, which might have some overhead.
Provide a runner option to mount the container /tmp directory to some host directory that we clean up afterwards, perhaps
--tmpdir DIR mount each task container /tmp to a new directory under host DIR, which is deleted on exit
(We're avoiding a general mount feature, which would create a non-portable side channel for getting files in & out of a task's container)
This turns out to be tricky to do in a way that won't be incompatible with a future multi-node design:
- as the scratch directory isn't shared, we can't manipulate it from the miniwdl process
- we can't do everything we'd like from the in-container bootstrap script, because the scratch directory has to have been already mounted by that point
- to manipulate from another docker container, we'd need a way to guarantee it'll get scheduled on the same node as the task's
Also, a reference suggesting OverlayFS might not introduce a lot of overhead: http://www-scf.usc.edu/~qiumin/pubs/ISPASS17_docker.pdf
Attempted Workaround: manually copy the inputs into the scratch directory (in my case, it's /dev/shm
), and call miniwdl
from the scratch directory with this cfg:
[file_io]
mount_tmpdir = true