containers
containers copied to clipboard
Add procps to biocontainer base
We are using several biocontainer images in Nextflow for processing data for our database.
Nextflow offers some excellent reporting and tracing tools for benchmarking different steps in workflows, which we find very helpful for determining how much resource to request on our cluster for different processes in the workflow - however these tools only work if procps is installed in the container that the process is running in. Adding this to the base would not add much overhead, but would enable Nextflow reporting tools to be used in workflows using Biocontainers images.
Hi, All containers do not use the same base, so would impact only new containers and those using base image. Adding to base image could be done, but would be useful only, currently, on limited number of containers. And would not be available on conda based containers (raw image containing only the tool).
Can t nextflow mount the tool if needed in container? (Or install it as prestart requirement if it need it?)
Hi,
Ah ok - the small numbers of containers I use regularly all use the base image, but of course I haven't looked through everything you have exhaustively.
From my investigations around this so far:
- ps must be installed in the container that a Nextflow process is using in order to get benchmarking data for that process. If tracing is turned on and it is missing from a container for any given process, the workflow will crash when it gets to that process Having ps installed locally does not work.
- If you have asked Docker to run a process in a container, input and output directories are mounted under the hood. I cannot see any options in Nextflow that allow you to mount additional directories, but I will look again.
- When running locally, it is possible to hack this by copying an appropriately compiled ps binary into the input directory, however this does not work consistently for our production team who may run workflows on one of four different clusters with different configurations depending on load and who is running it. Having this tool in the container would ensure this works consistently in all our environments (that is the point of containers after all)
- Our workflows are configured such that different processes run in different containers. Even where this is not true, Nextflow starts a new instance of a container for every instance of every process. I can't find a way to sensibly run prestart commands in Nextflow as Nextflow makes the assumption that containers are already configured how you want. The easiest solution is to add an install command to the script for each process, but this isn't as straightforward as it might seem. Where processes are using different containers this doesn't work consistently (e.g, default users for different containers may have different privileges and different containers may use different repositories). Even when processes are using the same container, the fact that each process starts a new instance of the container means this sort of prestart would need to be configured for every process in the Nextflow workflow separately, which is frustrating for workflows that may contain hundreds of processes (not to mention adding substantially to run time) and doubly so when a workflow has more than one process making use of the same container.
Essentially, if ps is in a container everything just works. If not, it's kind of hackable but very frustrating for large production workflows that may not run in the same environment every time. If anyone reading this has alternative suggestions, I would be glad to hear them.
Having said that, this is not vital for us, it would just provide useful bonus features, so if you feel it is not feasible to implement feel free to close the issue.
nf-core is recommending now the biocontainers, especially the singularity ones. So I assume this issue can be closed.
@bgruening . Not everything is in BioConda/BioContainers, so the issue with images from the repository here still remains.
Adding procps to the base image is a solution, that ensures that all future builds (that use this base image, of course) get ps.
If procps is not added to the base image, then every individual Dockerfile that people want to use in Nextflow would benefit from an extra apt-get install procps.
@muffato yes I understand this. And we discussed this for the first time in 2018 with the nf core devs. I think they have until now a workaround. But even if not, I think this is what you need to do then. We should not include arbitrary dependencies that don't belong to the application - in this case a dependency of nf.
Thanks @bgruening . I wasn't satisfied by "nf-core is recommending now the biocontainers, especially the singularity ones" being the reason for closing the ticket, but I'm happy with the explanation "We should not include arbitrary dependencies that don't belong to the application - in this case a dependency of nf."
Thanks for your understanding!