miniwdl icon indicating copy to clipboard operation
miniwdl copied to clipboard

Support non-docker runtime

Open joshua-gould opened this issue 3 years ago • 5 comments

It would be useful on HPC systems to support running a task with the current environment without docker. I know that using singularity has also been proposed and this would be useful too.

joshua-gould avatar Oct 14 '20 18:10 joshua-gould

I am eager for this option too.

It seems filter out the docker property in runtime should be working.

However, within command, things like /usr/gitc/gatk4/gatk-launch should be changed to sth. like gatk-launch.

galaxy001 avatar Oct 15 '20 06:10 galaxy001

Thanks! Although we haven't been able to prioritize this yet, we have been evolving a plugin interface for the task runner back-end, which will eventually facilitate it. To clarify, in an HPC cluster, wouldn't you also need a little something to submit jobs to the scheduler? cc @kislyuk @lynnlangit

mlin avatar Oct 15 '20 17:10 mlin

An initial goal is to run on a workflow without docker on a single node in a cluster. The next step would be to run across multiple nodes. Thanks.

On Thu, Oct 15, 2020 at 1:42 PM Mike Lin [email protected] wrote:

Thanks! Although we haven't been able to prioritize this yet, we have been evolving a plugin interface https://github.com/chanzuckerberg/miniwdl/blob/v0.9.x/WDL/runtime/task_container.py for the task runner back-end, which will eventually facilitate it. To clarify, in an HPC cluster, wouldn't you also need a little something to submit jobs to the scheduler? cc @kislyuk https://github.com/kislyuk @lynnlangit https://github.com/lynnlangit

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/chanzuckerberg/miniwdl/issues/442#issuecomment-709484980, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABH6THZ3FCM7XWZDJE4LPOTSK4YBJANCNFSM4SQ6R5HQ .

joshua-gould avatar Oct 15 '20 17:10 joshua-gould

The Broad uses sometimes uses singularity containers (often with the dsub [or qsub] utility) to run some pipelines on their internal HPC cluster. Same config for Imperial College of London.

lynnlangit avatar Oct 15 '20 21:10 lynnlangit

@joshua-gould at CZI, we have been using miniwdl extensively to run workflows on a single instance, with each task running in a Docker container (so a Docker daemon is required on the instance). We do consider the Docker containerization to be essential for reproducibility and isolation purposes.

We do also want to keep miniwdl flexible and appealing in a broad array of runtime environments, while keeping its development sustainable. To that end @mlin has been carefully arranging the miniwdl codebase to be extensible via a set of plugin APIs so that workflow/task I/O as well as task container dispatch APIs other than Docker (like Singularity) can be supported in the future.

I've been working with @mlin on an AWS Fargate based runtime (https://github.com/chanzuckerberg/miniwdl-plugins/tree/master/aws-fargate) where each task runs within a Fargate task (Fargate is a container-as-a-service API so you can think of it as a Docker daemon in a cloud). The miniwdl instance managing the workflow (the "little something" @mlin is referring to) then runs on an arbitrary lightweight VM that supervises the workflow from start to finish.

In a similar vein, if folks are interested in Singularity or "dsub/qsub based dispatch" support ASAP, they may want to look into writing a miniwdl plugin for that. (Since the documentation on plugin development is still a little scarce, in the near term you may want to jump on a call with @mlin to get the process kick-started, but the basic blueprint can be seen in the plugin codebase I linked above.)

kislyuk avatar Oct 16 '20 21:10 kislyuk