Can cuneiform run tasks in separate docker containers?
I wanna use cuneiform to run bioinformatics pipelines. Nextflow and loom can run tasks in separate docker containers. Can cuneiform do this also?
Cheers, Zhen Zhang
Hello Zhang,
I am pleased to hear that you want to give Cuneiform a try. I think that Docker in combination with bioinformatics workflow systems is a great combination.
The Nextflow community has put a lot of effort into their Docker integration and I think the workflow system is a formidable choice. It has a thorough documentation and a vibrant community. With Cuneiform we have been taking a little bit of a different approach. First of all, we have made a distinction between the workflow language and the distributed execution environment. So Cuneiform, the language, is actually just an interpreter. To the outside world this interpreter is a service that communicates with a distributed execution environment, which is also a service. Also the worker nodes performing the work in the distributed execution environment are services.
There is, of course, absolutely nothing that keeps you from starting all these services in Docker containers or managing the life cycle of such a service composition in Kubernetes. In fact, that is exactly the use case that Docker and Kubernetes are aiming for.
To answer your question, the Cuneiform interpreter, as a service with a very limited functionality, has no explicit integration of Docker. However, I'd say that Cuneiform is designed to play well with Docker but does not make the decision for you, which containerization, orchestration, or virtualization technology to use. It maximizes your freedom in this respect.
Hi, I'm currently integrating Docker into our Hadoop-based workflow management system Hi-WAY. I continue the research and development of my colleague Marc Bux who is about to finish his PhD. One of my goals (and it shouldn't be too difficult) is a design where a single docker container can be used to execute a cuneiform workflow both locally (on a strong host with Docker installed) and distributed (on a cluster with Hadoop and Docker installed). Since research is my primary focus, I'm careful to make promises, but I think there will be a release before Christmas. We can stay in touch, I'm also interested in your use case.
Right now I'm learning Hi-WAY. The adaptive scheduling interests me very greatly. Can the adaptive scheduling be implemented in docker swarm also? I'm not against the idea of using a single docker container to run a cuneiform workflow. A typical bionformatics workflow consists of a few steps, such as, checking read quality, removing adapters from reads, mapping, calling variants. The better solution with docker is that each step run individual tasks on a container. Recently, I wrote a web-based PGS analysis software for my company. The software has a vuejs-based frontend a flask-based backend, a socket.io-based notifier and several celery-based background workers. snakemake is used to run workflow on each worker. I use docker for test and deploy, not for running tasks of a workflow. In my opinion, snakemake can do the similar job as cuneiform. cuneiform outperforms snakemake in terms of parallelism. If cuneiform has docker support, I will replace snakemake with cuneiform.
The question Hi-WAY questions are best answered by @carlwitt and @marcbux because Marc was the one who conceived the system and Carl is currently experimenting with Docker support.
Single container per workflow is, I think, just a fall-back option to increase reproducibility and give the user most freedom in how to use each docker container. I agree that orchestrating many distributed docker containers is the most interesting use case.
I would disagree that Cuneiform outperforms snakemake in general. I think it's only when you can't tell the workflow structure a priori that Cuneiform shines. For the use cases where a static workflow structure is sufficient I would expect snakemake to perform just as well.
What I'm currently working on, is to run each task in a Docker container. However, I work on this as a feature of our Hadoop-based execution engine (Hi-WAY), not the Cuneiform language (which comes with its own execution engine). In the future I would like to additionally support to run the whole workflow in a single container for situations where a single, well-equipped machine is available (e.g., with a few dozen cores and some TBs of main memory).
No one added to this discussion in quite a while. I am closing the issue now.
I am reopening this issue because I think that having tasks running in their own docker container would be a very valid addition.
The choice of the container image could be integrated in the language as an optional foreign function decoration:
def myfun() -> <out : String> in bash use alpine:latest *{
...
}*