common-workflow-language
common-workflow-language copied to clipboard
Generalised ContainerRequirement
Docker is great, but other container packaging and execution tools exist and it would be unfortunate for workflow authors to require a specific dependency on docker when docker-specific features are not actually required.
I suggest that it would be advantageous for CWL to support a more generic ContainerRequirement
which extends ProcessRequirement
and that specifies the minimum set of parameters necessary to support containerisation without any docker-specific features. This could simply consist of a content address that describes the contents of the container rootfs along with a pointer to where/how to obtain one (i.e. a URL), along with specifying the semantics of how to map a CommandLineTool into the generic container (the ability to run a command line within a container seems like it ought to be fundamental to any container executor, but perhaps we need to be explicit about what the default environment should be). Both the content address and the URL could be optional (required=False
).
The existing DockerRequirement
would be retained, but modified to extend ContainerRequirement
rather than extending ProcessRequirement
directly. The purpose of DockerRequirement
would be to specify docker-specific functionality, including how to obtain a container using docker pull/load/file/imageid.
It would be valid to specify both ContainerRequirement
and DockerRequirement
-- the semantics of this would depend on whether they are specified in requirements
or hints
. Here is one way it might work:
ContainerRequirement
in requirements
, DockerRequirement
in requirements
: specifically require the docker container executor, but after obtaining the docker image as specified in the DockerRequirement
, calculate the rootfs content address and require that it matches the one specified in the ContainerRequirement
(if specified).
ContainerRequirement
in requirements
, DockerRequirement
in hints
: require containerisation and that the rootfs matches the specified content-address, but the implementation can choose to obtain the image from the URL specified in ContainerRequirement
or the docker-specific mechanism specified in DockerRequirement
(or in future from another container system's mechanism if it is hinted at as well).
ContainerRequirement
in hints
, DockerRequirement
in requirements
: specifically require docker and obtain the container following docker-specific semantics. Check the content address of the container if specified in ContainerRequirement
but continue running (perhaps with a warning) if it doesn't match what is obtained via docker.
ContainerRequirement
in hints
, DockerRequirement
in hints
: the implementation is free to choose to use docker or another containerisation system, or to not use containerisation at all. Warnings may be issued if the ContainerRequirement
rootfs content address does not match what is provided from docker.
In future, other specific container systems could be added to support specific semantics or container image discovery mechanisms, but the common requirement for ContainerRequirement
can always be specified to enforce checking of the rootfs content-address.
It would be nice if ContainerRequirement
would generalise far enough that a runtime could implement it with a full VM (or perhaps even booting a physical machine with the specified rootfs) rather than using a lightweight Linux container. To support this, the ContainerRequirement
could also have parameters specifying machine architecture and OS/kernel version. Alternatively, those requirements could be captured in another high-level requirement or requirements (perhaps MachineRequirement
?). The problem with splitting them up would be that some steps could specify ways to realise them using multiple architectures (i.e. the ContainerRequirement
is associated with a specific architecture and kernel).
It would be great to have a more abstract requirement and multiple sets of hints on how to fulfil it. For example:
requirements:
class: CommandMatchesRequirement
command: mytool --version
matchesSubstring: 2.5.1 # Or e.g. "matchesPattern" for regex
hints:
- class: DockerRequirement
dockerPull: mytool:latest
- class: RequirementSet
requirements:
- class: ContainerRequirement
rootfsURL: https://example.com/fs
- class: EnvVarRequirement
envDef: {"envName": "MYVAR", "envValue": "foo"}
The above should be interpreted as "Need to make the output of mytool --version contain 2.5.1, here's two ways how to make it happen (docker, or container with fs from URL with MYVAR=foo in env)".
After running a tool/workflow, a helper tool can be used to create a version of the workflow with environment setup that was used moved into requirements for recomputability.
Can we use RequirementSet
approach to extract MachineRequirement
nicely?
Interesting. CommandMatchesRequirement
could also be a generic way to support verification of what container one has as well (because you could codify the validation that you have what you are supposed to have as a command to be run from inside the container).
For example:
requirements:
class: CommandMatchesRequirement
command: find / -fstype "proc" -prune -o -fstype "sysfs" -prune -o -fstype "devpts" -prune -o -print | LC_ALL=C sort | cpio -o | md5sum | awk '{print $1}'
matchesSubstring: 6fe91993f4046da88b4639c0671f36cc
hints:
- class: DockerRequirement
dockerPull: ubuntu:14.04
However, this would be a very expensive operation to run and it would be best if an implementation cached the result. Could we specify that all CommandMatchesRequirement commands are supposed to be pure (so that they could be cached for a particular environment)?
A few ideas for alternate ways of hosting a runtime that is sufficient to satisfy tool dependencies but isn't Docker:
- chroot() + bind mounts
- Lightweight virtual machines & minimal OS images that can boot in a few seconds
- PRoot: http://proot.me/
I am all about @ntijanic proposal in https://github.com/common-workflow-language/common-workflow-language/issues/80#issuecomment-121213349
Recently @tetron linked to
https://www.opencontainers.org/
https://github.com/opencontainers/specs
This ties in very closely with the work I will be doing with @mr-c, a brief overview of which is in the wiki at Userspace Container Review.
A tangent on this issue discussed specifying software requirements, which is now covered by http://www.commonwl.org/v1.0/CommandLineTool.html#SoftwareRequirement
@jrandall is there a specific containerization technology that you'd like to see as a sibling to DockerRequirement
?
This issue has been mentioned on Common Workflow Language Discourse. There might be relevant details there:
https://cwl.discourse.group/t/dockerrequirement-vs-containerrequirement/462/2