miniwdl
miniwdl copied to clipboard
Support mounting docker volumes in worker containers
Miniwdl (depending on settings) currently mounts a variety of files/directories in its --dir
directory into its task containers. The current implementation assumes either:
- Miniwdl isn't being run from inside a container itself
- If miniwdl is being run from inside a container, its --dir directory is mounted from the host, at the same path that it's mounted inside the container
Both of these assumptions get pretty weird when running miniwdl in a docker desktop container on OSX -- there are workarounds, but I think these assumptions decrease portability overall, when docker volumes are a 100% standard way to manage persistence in Docker. What I have in mind at the moment is adding a flag along the lines of --docker-volume-dir
to tell miniwdl that the directory given in its --dir
parameter is being mounted from a docker volume, and it should mount that volume into the containers it creates instead of creating mounts to the host's filesystem for each input file.
Thanks -- I like this idea; there's approximate precedent in how miniwdl-aws has an EFS filesystem mounted into all containers (including the one miniwdl itself runs in).
A couple details we need to think about:
- Top-level workflow inputs and outputs. Are we ok with requiring user to place all the inputs in the volume beforehand, and retrieve outputs from it afterwards? Or will they need/want help staging them to/from host filesystem.
- Isolation between containers. Miniwdl uses multiple bind-mounts for each container to provide each task with read/write access to its working directory, read-only access to its input files, and no access to anything else on the host filesystem. I don't know if we can get the same granular control with docker volumes; if we just mount the volume read/write into all containers, then any task is technically able to overwrite its input files, or anything else for that matter. Partly security, but the bigger concern is clever coders figuring this out, assuming it's OK and pushing non-portable stuff. (miniwdl-aws makes the same tradeoff with EFS and we consider it acceptable in that context.)
Right - those are excellent points, and I hadn't actually considered the data initialization issue before you brought it up 😅
However, I do think the tradeoffs for this functionality are the same as the EFS integration.
- If you're mounting an EFS volume or Docker volume, it's on you to make sure the input files are where they need to be before invoking miniwdl. It makes sense to include a HOWTO for this in the docs though!
- miniwdl can include warnings that modifying input/output files in mounted volumes is Very Sketchy and also Dangerous, but I'm not sure that preventing this takes priority over being able to run miniwdl in multiple contexts.