cromwell
cromwell copied to clipboard
Feature request: volume mounting
To address the type of use case described in http://gatkforums.broadinstitute.org/gatk/discussion/comment/38188#Comment_38188
@vdauwera can you summarize the use case in the forum?
I don't have a good handle on the details but it looked like @ChrisL understood it well.
Found the forum entry looking for a solution for mounting a docker volumen.
In my case I would like to run Ensembl VEP with Cromwell/WDL. Using VEP in cache/offline mode has many advantages, among them much better performance. When running VEP in cache mode it is necessary to have a large set of files locally installed. Downloading these files using the provided INSTALL.pl will be very inefficient. I plan for now to tar everything together and download and untar from a google bucket every time I run the task. However, it would be much better if I could mount a docker volume to the container running the task.
The way I see it I would be able to define an snapshot in the runtime section of the task definition. I would also be able to define the mount point (docker run -v *:{mount point}) where this snapshot would be available as a docker volume. In the background Cromwell would provision a disk using the snapshot, mount it to the VM and use the correct docker run -v /path/to/disk:/requested/mount/point
docker run command.
Hope this helps defining this issue.
Thanks for considering raising the priority of this.
We have a very similar use case. We'd like to be able to run a different annotator that has a massive pile of data sources ~20gb. We want an easy way to package different sets of test files and make them available for people to use with our docker image, without having to make a 20gb docker image.
Hi, the same problem as @CarlosBorroto ... ! Just wanted to push the issue!
Hi, the same problem. It will be great addition to Cromwell. thnx
Hello @vdauwera , I have a similar use case in Cromwell that I think this could cover. We specifically hope we can mount the type=tmpfs
volume. This creates a ram disk which we use to unpack data that has tens of thousands of files very quickly.
Google describes how to do this in their docs https://cloud.google.com/compute/docs/containers/configuring-options-to-run-containers#mounting_tmpfs_file_system_as_a_data_volume
We have had success using this in our Slurm Cromwell by launching the docker docker through submit
ourselves and giving the docker run the parameter to mount
${'--mount type=tmpfs,destination='+mount_tmpfs}
It would be great if declaring a tmpfs mount point could also be supported by cromwell in google cloud submissions. Thanks!
+1 on tmpfs
. Currently, we have to create a directory under /dev/
and rely on the assumption that that directory gets mounted by default as a tmpfs
with 1/2 of the available RAM (at least on GCP). This is obviously not ideal. Delocalization of such files is problematic as well.
Our use case is exactly the same, to unpack/process tens or hundreds of thousands of small files (in a BCL). Doing so with any "normal" disk is much slower than with RAM.
Hi -- I know this is an old issue, but has there been any further discussion on how to mount persistent disks? We're using PAPIv2 as the backend, and we'd like to expose reference databases (stored as filesystems) to our docker containers via a mounted volume.
Hi @armedgorillas,
you can bypass this by calling the docker-container from the task itself. I wrote about it in the GATK-Forum a while ago take a look at: https://gatkforums.broadinstitute.org/gatk/discussion/comment/50056#Comment_50056
Greetings Selonka
Thanks @Selonka! That looks like a nifty workaround.
Does this work with the Pipelines API backend, or just with a local backend?
I don’t think it works for the PAPI backend, because one needs to mount docker.sock
into the container to be able to invoke Docker commands. I’m honestly a little surprised it even works locally.
so, no progress on this issue for years?
+1, similar issue needed for workflow to run with docker container & massive databases necessary for app to finish.