cromwell icon indicating copy to clipboard operation
cromwell copied to clipboard

Add support for Singularity images

Open antonkulaga opened this issue 9 years ago • 51 comments

In many use-cases (exp. HPC and systems when you cannot run from root) Singularity ( http://singularity.lbl.gov/about ) is better than docker, it would be nice to see its support in Cromwell

antonkulaga avatar Apr 17 '17 21:04 antonkulaga

@katevoss I'm one of the developers of Singularity and I would like to +1 this request! I don't know scala, but if it comes down to making an equivalent folder like this one for Docker I can give a first stab at it. Or if it's more helpful I can give complete examples for all the steps to working with singularity images. We have both a registry (Singularity Hub that is hooked up to the singularity command line client to work with images. So - to integrate into cromwell you could either just run the container via a singularity command, or implement your own connection to our API to download the image. Please let me know how I might be helpful, and I'd gladly help. If you want me to give a go at scala I would just ask for your general workflow to compile and test functionality.

vsoch avatar Apr 20 '17 22:04 vsoch

Hi @vsoch, thanks for chiming in! Supporting Singularity is definitely on our to-do list but it will take some exploration so it would be great to have your assistance as we plan our approach. I can't be sure when we'll start but we'll definitely keep you in the loop. Thanks again!

katevoss avatar May 12 '17 23:05 katevoss

awesome! Yeah just ping on here when you are ready, and I'll be glad to help :)

vsoch avatar May 12 '17 23:05 vsoch

@geoffjentry I know I've heard Singularity come up fairly regularly, do you know if there are users who are insisting on using Singularity in order to use Cromwell?

katevoss avatar Sep 14 '17 13:09 katevoss

@katevoss I don't know of anyone who has said "I need this" or "If it existed I'd use Cromwell". Rather it's a topic which is building more steam across the space and I'm suggesting it'd be nice to be leaders and not followers here.

geoffjentry avatar Sep 18 '17 17:09 geoffjentry

As a user with images in Singularity, I want Cromwell to support using Singularity images (either via Singularity Hub and the command line, or connecting via API), so that I can use Singularity images and not have to duplicate them in Docker.

  • Effort: ** @geoffjentry ? **
  • Risk: Small
  • Business value: Medium

katevoss avatar Sep 18 '17 20:09 katevoss

I can also offer to help, in whatever form is useful! If you just need to use / pull, then Singularity image support via installing it should fit the bill. Users can use Github to host images via Singularity Hub. If you want to host your own registry, then Singularity Registry is the way to go! Let me know if I can help, etc.

vsoch avatar Sep 18 '17 20:09 vsoch

@katevoss I've been thinking about trying to tackle this as my holiday break project. If nothing else I should have a better idea of what's involved on our side.

geoffjentry avatar Nov 16 '17 19:11 geoffjentry

👍 🎁 🎄 💯 🕎 🕯

katevoss avatar Nov 16 '17 19:11 katevoss

I'm checking out WDL/Cromwell at the moment and this feature would make Cromwell definitely more interesting. It would make it much easier to run reproducible pipelines without relying on docker. (Docker is a no go on our cluster because it gives users root access.)

rhpvorderman avatar Dec 19 '17 15:12 rhpvorderman

I just found out that Cromwell-Singularity integration will be on the agenda on Winter Codefest 2018, starting tomorrow! See https://docs.google.com/document/d/1RlDUWRFqMcy4V2vvkA1_ENsVo6TXge2wIO_Nf73Itk0/edit#heading=h.xg79ql4rt605

You can join in (also remotely) by checking this file: https://docs.google.com/spreadsheets/d/1o4xDUgl2iu_CgFuDpB1swtG8XVZK3aifvKlhh5qagyI/edit#gid=0

ps-account avatar Jan 17 '18 12:01 ps-account

@pimpim just a heads up that I threw that on there as a suggestion so it relies on people sharing the interest :)

We do expect to have udocker support soon via work being done by another group - I’ve heard rumors that one can run singularity via udocker so that might be another approach

geoffjentry avatar Jan 17 '18 13:01 geoffjentry

I also encountered the udocker-singularity route in the discussion on cwltool singularity integration. Maybe it is an idea to take a closer look on the udocker-singularity implementation as a starting point for workflow tool singularity usage.

Or maybe not, because you will lose HPC friendly singularity features this way!

ps-account avatar Jan 17 '18 13:01 ps-account

@geoffjentry In case this is accessible, can you point me to the udocker singularity work you mentioned?

ps-account avatar Jan 17 '18 13:01 ps-account

With udocker running in proot vs Singularity running in chroot, some HPC performance/IB/GPUcapability issues might occur in this route.

ps-account avatar Jan 17 '18 15:01 ps-account

I just want to chime in and say that support for Singularity would be useful, it's nice to see that you are working on it!

oskarvid avatar Feb 09 '18 09:02 oskarvid

I support this as well.

jim-bo avatar Feb 16 '18 19:02 jim-bo

@geoffjentry is there any update on udocker support or is that already works with some tricks ?

I got same question for singularity as well.

abdulrauf avatar May 01 '18 00:05 abdulrauf

hey I noticed that you guys use Google Cloud? http://cromwell.readthedocs.io/en/develop/wf_options/Google/ I have a builder that runs here, so there might be some synthesis between the two, although I'm not super familiar with Cromwell. If you just need to use Singularity containers your best bet is to do a singularity pull (and wrap these commands into your workflow functions, allowing the user to specify the container uri). if there is more of a service that someone is running with cromwell and you want to dip into the storage directly (and would use the API en masse) then we could try this --> https://cloud.google.com/storage/docs/requester-pays

vsoch avatar May 01 '18 01:05 vsoch

As far as I remember, one issue with some workflow managers concerned the naming of containers in the workflow format. E.g. CWL had/has Docker hard-coded into it. Some attention has been given at the last biohackathon, please check this link: https://twitter.com/biocrusoe/status/954738513475448835

ps-account avatar May 02 '18 15:05 ps-account

hey friends! Just wanted to poke here again that this is still badly wanted / needed / desired / dreamed of / prayed for / sacrificial lambs... (you get the idea :P _) Any updates? Can I help?

vsoch avatar Aug 08 '18 22:08 vsoch

Hi @vsoch - the first problem to solve is how to represent the usage of singularity in one's WDL (not sure how CWL does it, will need to look). This is being discussed in the OpenWDL group so if you have thoughts here that'd be very welcome.

For instance, is there a way to express "run this container" but not be locking a downstream WDL user into Singularity vs Docker?

geoffjentry avatar Aug 08 '18 22:08 geoffjentry

I'm not great / experienced with Cromwell, and to be honest I'm not sure what native support would mean. What I was trying is to just treat a singularity container like an executable, and add it as a Local backend, sort of like this --> https://github.com/vsoch/wgbs-pipeline/pull/1/files#diff-f6baca157827c4888c394eab694e000c

That works to run the analysis step (in a singularity container) just using singularity like any executable. I don't totally understand the job_id so there is a bug, but my colleague @bek is going to take a look! The container is run to produce the output, so that's a good start at least (and probably I'm missing something huge here).

So to answer your question... in my wdl at least, I'm just using the same local commands. It looks the same as it would running any Local backend configuration.

vsoch avatar Aug 08 '18 22:08 vsoch

Yeah doesn't it come down to:

# singularity
$ java -jar -Dconfig.file=backends/backend.conf -Dbackend.default=singularity cromwell-34.jar run runners/test.wdl -i data/TEST-YEAST/inputs.json -o workflow_opts/singularity.json

vs

# docker
$ java -jar -Dconfig.file=backends/backend.conf -Dbackend.default=docker cromwell-34.jar run runners/test.wdl -i data/TEST-YEAST/inputs.json -o workflow_opts/docker.json

so you use the same *.wdl but just choose a different backend / and workflow opts?

?

vsoch avatar Aug 08 '18 22:08 vsoch

Oh interesting, so in this way it's more similar to the udocker hacks that people have used (which I can't find an example of right now, but they exist). That could certainly get us most of the way there, if not all of the way there.

I think I've been viewing Singularity & Docker as more of an "either/or" in that perhaps a task would require a singularity container vs a docker container - but if that's not really the case I've definitely been overcomplicating the matter. I'll admit that I've never been comfortable in my understanding of Singularity.

@vsoch you're obviously well versed in all things Singularity - do you see any utility to defining the use of a Singularity container in the WDL (ie no matter what this task should always use Singularity) or is it going to be more of a site specific situation, like hwat you're showing here?

geoffjentry avatar Aug 10 '18 17:08 geoffjentry

I think I've been viewing Singularity & Docker as more of an "either/or" in that perhaps a task would require a singularity container vs a docker container - but if that's not really the case I've definitely been overcomplicating the matter. I'll admit that I've never been comfortable in my understanding of Singularity.

If you are using a container, it definitely is an "either / or" in the sense that getting one working inside the other is pretty challenging. The reason a Dockerized cromwell doesn't work on a host (to submit jobs to other docker or singularity containers) is because of having the docker/singularity submit come from inside the container. We don't really want to do that anyway, because there is a double dependency. But on the other hand, we want to provide reproducible solutions, meaning that things are container based. In an ideal setup, I would have some (still container based) cromwell acting as more of a docker-compose setup, and issuing commands to other containers. Ideally there would be one maintained Docker container for a step in a pipeline, and then if it's run on an HPC resource (where you can't have docker) it would just be dumped into singularity (docker://<username>/<reponame>)

But this case is a little different - I'm just talking about the cromwell "plugin". I don't actually understand why this is necessary, at least given that singularity containers can act like executable. If I want to run a python script, I run it in the command section, as an executable. I don't require a python plugin. Now given that Singularity changes so that we want to take advantage of more of the instance commands (e.g., we can start, stop, get a status) this might make it more like docker and warrant a plugin. But for now, it's not quite there, and making a plugin would just be a really fancy interface to run an executable. Does this make sense?

@vsoch you're obviously well versed in all things Singularity - do you see any utility to defining the use of a Singularity container in the WDL (ie no matter what this task should always use Singularity) or is it going to be more of a site specific situation, like hwat you're showing here?

I don't think it would be site specific (if the container is singularity, it would largely be the same, a container_uri and then some args to it). The only reason I have two sections is because I was trying out two ways to do it. Neither of them fully work (at least according to cromwell) because I don't know what that job_id business it :)

vsoch avatar Aug 10 '18 18:08 vsoch

Hi @vsoch - to be clear, what I mean is this ...

If I'm writing a WDL and I want to put some container in the runtime block, should I be opinionated as to if it's singularity or docker or should that be up to the person running the WDL? I used to view it as the former, but now I think it's the latter?

geoffjentry avatar Aug 10 '18 19:08 geoffjentry

Wouldn't it be up to the person running the wdl? If it's not up to me, how I am empowered to say I am using slurm vs a container environment like kubernetes? to be clear I've only used Cromwell a day and a half so I'm not the right person to answer this question. I'm trying to understand how Singularity would fit in beyond being a binary executable (that might work in several environments). I think @bek might be able to weigh in?

vsoch avatar Aug 10 '18 20:08 vsoch

Had a convo w/ Seth yesterday and looked into a few similar things (e.g. cwltool's support). I think the proper plan is as follows:

  • Explore the path you've been looking at, by changing the configuration of a Cromwell backend to use Singularity instead of docker, but just for docker containers. This would cover the most common use cases
  • Separately continue the conversation at OpenWDL to explore what support for native Singularity containers might look like in WDL

geoffjentry avatar Aug 11 '18 23:08 geoffjentry

Aye aye! I don't know scala, but I found the developer docs and I know how to use GIthub, so I'm ready to go, lol. I likely won't start this weekend (I have a few projects I'm working on!) but next week for sure. I'll put updates, troubles, and other musings here - thanks in advance for your help :)

vsoch avatar Aug 12 '18 00:08 vsoch