cromwell icon indicating copy to clipboard operation
cromwell copied to clipboard

Feature request: volume mounting

Open vdauwera opened this issue 7 years ago • 14 comments

To address the type of use case described in http://gatkforums.broadinstitute.org/gatk/discussion/comment/38188#Comment_38188

vdauwera avatar Apr 21 '17 15:04 vdauwera

@vdauwera can you summarize the use case in the forum?

katevoss avatar Sep 26 '17 14:09 katevoss

I don't have a good handle on the details but it looked like @ChrisL understood it well.

vdauwera avatar Sep 26 '17 16:09 vdauwera

Found the forum entry looking for a solution for mounting a docker volumen.

In my case I would like to run Ensembl VEP with Cromwell/WDL. Using VEP in cache/offline mode has many advantages, among them much better performance. When running VEP in cache mode it is necessary to have a large set of files locally installed. Downloading these files using the provided INSTALL.pl will be very inefficient. I plan for now to tar everything together and download and untar from a google bucket every time I run the task. However, it would be much better if I could mount a docker volume to the container running the task.

The way I see it I would be able to define an snapshot in the runtime section of the task definition. I would also be able to define the mount point (docker run -v *:{mount point}) where this snapshot would be available as a docker volume. In the background Cromwell would provision a disk using the snapshot, mount it to the VM and use the correct docker run -v /path/to/disk:/requested/mount/point docker run command.

Hope this helps defining this issue.

Thanks for considering raising the priority of this.

CarlosBorroto avatar Oct 06 '17 11:10 CarlosBorroto

We have a very similar use case. We'd like to be able to run a different annotator that has a massive pile of data sources ~20gb. We want an easy way to package different sets of test files and make them available for people to use with our docker image, without having to make a 20gb docker image.

lbergelson avatar Dec 06 '17 20:12 lbergelson

Hi, the same problem as @CarlosBorroto ... ! Just wanted to push the issue!

Selonka avatar Feb 28 '18 09:02 Selonka

Hi, the same problem. It will be great addition to Cromwell. thnx

vinash85 avatar Mar 12 '18 14:03 vinash85

Hello @vdauwera , I have a similar use case in Cromwell that I think this could cover. We specifically hope we can mount the type=tmpfs volume. This creates a ram disk which we use to unpack data that has tens of thousands of files very quickly.

Google describes how to do this in their docs https://cloud.google.com/compute/docs/containers/configuring-options-to-run-containers#mounting_tmpfs_file_system_as_a_data_volume

We have had success using this in our Slurm Cromwell by launching the docker docker through submit ourselves and giving the docker run the parameter to mount

${'--mount type=tmpfs,destination='+mount_tmpfs}

It would be great if declaring a tmpfs mount point could also be supported by cromwell in google cloud submissions. Thanks!

jason-weirather avatar Mar 14 '18 16:03 jason-weirather

+1 on tmpfs. Currently, we have to create a directory under /dev/ and rely on the assumption that that directory gets mounted by default as a tmpfs with 1/2 of the available RAM (at least on GCP). This is obviously not ideal. Delocalization of such files is problematic as well.

Our use case is exactly the same, to unpack/process tens or hundreds of thousands of small files (in a BCL). Doing so with any "normal" disk is much slower than with RAM.

dinvlad avatar Aug 17 '18 21:08 dinvlad

Hi -- I know this is an old issue, but has there been any further discussion on how to mount persistent disks? We're using PAPIv2 as the backend, and we'd like to expose reference databases (stored as filesystems) to our docker containers via a mounted volume.

armedgorillas avatar Mar 28 '19 16:03 armedgorillas

Hi @armedgorillas,

you can bypass this by calling the docker-container from the task itself. I wrote about it in the GATK-Forum a while ago take a look at: https://gatkforums.broadinstitute.org/gatk/discussion/comment/50056#Comment_50056

Greetings Selonka

Selonka avatar Mar 28 '19 23:03 Selonka

Thanks @Selonka! That looks like a nifty workaround.

Does this work with the Pipelines API backend, or just with a local backend?

armedgorillas avatar Mar 29 '19 11:03 armedgorillas

I don’t think it works for the PAPI backend, because one needs to mount docker.sock into the container to be able to invoke Docker commands. I’m honestly a little surprised it even works locally.

dinvlad avatar Mar 29 '19 14:03 dinvlad

so, no progress on this issue for years?

antonkulaga avatar Apr 11 '21 12:04 antonkulaga

+1, similar issue needed for workflow to run with docker container & massive databases necessary for app to finish.

valentynbez avatar Jan 14 '22 17:01 valentynbez