common-workflow-language icon indicating copy to clipboard operation
common-workflow-language copied to clipboard

Generalised ContainerRequirement

Open jrandall opened this issue 8 years ago • 8 comments

Docker is great, but other container packaging and execution tools exist and it would be unfortunate for workflow authors to require a specific dependency on docker when docker-specific features are not actually required.

I suggest that it would be advantageous for CWL to support a more generic ContainerRequirement which extends ProcessRequirement and that specifies the minimum set of parameters necessary to support containerisation without any docker-specific features. This could simply consist of a content address that describes the contents of the container rootfs along with a pointer to where/how to obtain one (i.e. a URL), along with specifying the semantics of how to map a CommandLineTool into the generic container (the ability to run a command line within a container seems like it ought to be fundamental to any container executor, but perhaps we need to be explicit about what the default environment should be). Both the content address and the URL could be optional (required=False).

The existing DockerRequirement would be retained, but modified to extend ContainerRequirement rather than extending ProcessRequirement directly. The purpose of DockerRequirement would be to specify docker-specific functionality, including how to obtain a container using docker pull/load/file/imageid.

It would be valid to specify both ContainerRequirement and DockerRequirement -- the semantics of this would depend on whether they are specified in requirements or hints. Here is one way it might work:

ContainerRequirement in requirements, DockerRequirement in requirements: specifically require the docker container executor, but after obtaining the docker image as specified in the DockerRequirement, calculate the rootfs content address and require that it matches the one specified in the ContainerRequirement (if specified).

ContainerRequirement in requirements, DockerRequirement in hints: require containerisation and that the rootfs matches the specified content-address, but the implementation can choose to obtain the image from the URL specified in ContainerRequirement or the docker-specific mechanism specified in DockerRequirement (or in future from another container system's mechanism if it is hinted at as well).

ContainerRequirement in hints, DockerRequirement in requirements: specifically require docker and obtain the container following docker-specific semantics. Check the content address of the container if specified in ContainerRequirement but continue running (perhaps with a warning) if it doesn't match what is obtained via docker.

ContainerRequirement in hints, DockerRequirement in hints: the implementation is free to choose to use docker or another containerisation system, or to not use containerisation at all. Warnings may be issued if the ContainerRequirement rootfs content address does not match what is provided from docker.

In future, other specific container systems could be added to support specific semantics or container image discovery mechanisms, but the common requirement for ContainerRequirement can always be specified to enforce checking of the rootfs content-address.

jrandall avatar Jul 09 '15 11:07 jrandall

It would be nice if ContainerRequirement would generalise far enough that a runtime could implement it with a full VM (or perhaps even booting a physical machine with the specified rootfs) rather than using a lightweight Linux container. To support this, the ContainerRequirement could also have parameters specifying machine architecture and OS/kernel version. Alternatively, those requirements could be captured in another high-level requirement or requirements (perhaps MachineRequirement?). The problem with splitting them up would be that some steps could specify ways to realise them using multiple architectures (i.e. the ContainerRequirement is associated with a specific architecture and kernel).

jrandall avatar Jul 11 '15 13:07 jrandall

It would be great to have a more abstract requirement and multiple sets of hints on how to fulfil it. For example:

requirements:
  class: CommandMatchesRequirement
  command: mytool --version
  matchesSubstring: 2.5.1  # Or e.g. "matchesPattern" for regex
hints:
  - class: DockerRequirement
    dockerPull: mytool:latest
  - class: RequirementSet
    requirements:
      - class: ContainerRequirement
        rootfsURL: https://example.com/fs
      - class: EnvVarRequirement
        envDef: {"envName": "MYVAR", "envValue": "foo"}

The above should be interpreted as "Need to make the output of mytool --version contain 2.5.1, here's two ways how to make it happen (docker, or container with fs from URL with MYVAR=foo in env)".

After running a tool/workflow, a helper tool can be used to create a version of the workflow with environment setup that was used moved into requirements for recomputability.

Can we use RequirementSet approach to extract MachineRequirement nicely?

ghost avatar Jul 14 '15 11:07 ghost

Interesting. CommandMatchesRequirement could also be a generic way to support verification of what container one has as well (because you could codify the validation that you have what you are supposed to have as a command to be run from inside the container).

For example:

requirements: 
  class: CommandMatchesRequirement
  command: find / -fstype "proc" -prune -o -fstype "sysfs" -prune -o -fstype "devpts" -prune -o -print | LC_ALL=C sort | cpio -o | md5sum | awk '{print $1}'
  matchesSubstring: 6fe91993f4046da88b4639c0671f36cc
hints:
  - class: DockerRequirement
    dockerPull: ubuntu:14.04

However, this would be a very expensive operation to run and it would be best if an implementation cached the result. Could we specify that all CommandMatchesRequirement commands are supposed to be pure (so that they could be cached for a particular environment)?

jrandall avatar Jul 21 '15 15:07 jrandall

A few ideas for alternate ways of hosting a runtime that is sufficient to satisfy tool dependencies but isn't Docker:

  • chroot() + bind mounts
  • Lightweight virtual machines & minimal OS images that can boot in a few seconds
  • PRoot: http://proot.me/

tetron avatar Oct 06 '15 14:10 tetron

I am all about @ntijanic proposal in https://github.com/common-workflow-language/common-workflow-language/issues/80#issuecomment-121213349

Recently @tetron linked to

https://www.opencontainers.org/

https://github.com/opencontainers/specs

mr-c avatar Oct 26 '15 14:10 mr-c

This ties in very closely with the work I will be doing with @mr-c, a brief overview of which is in the wiki at Userspace Container Review.

kdm9 avatar Feb 01 '16 17:02 kdm9

A tangent on this issue discussed specifying software requirements, which is now covered by http://www.commonwl.org/v1.0/CommandLineTool.html#SoftwareRequirement

@jrandall is there a specific containerization technology that you'd like to see as a sibling to DockerRequirement?

mr-c avatar Jul 18 '16 16:07 mr-c

This issue has been mentioned on Common Workflow Language Discourse. There might be relevant details there:

https://cwl.discourse.group/t/dockerrequirement-vs-containerrequirement/462/2

cwl-bot avatar Oct 07 '21 13:10 cwl-bot