nextflow icon indicating copy to clipboard operation
nextflow copied to clipboard

docker.registry needs to override process.container when a registry is used in the container URI

Open stevekm opened this issue 7 months ago • 18 comments

When you define your Nextflow process with a process.container that includes a container registry in the URI, Nextflow ignores the docker.registry setting and uses the registry listed in the process.container.

Example;

https://github.com/nf-core/scdownstream/blob/1512d6ff5e3f2484c36b1bd1d53a0f435a7b3ae4/modules/nf-core/cellbender/merge/main.nf#L8

process CELLBENDER_MERGE {
    tag "$meta.id"
    label 'process_single'

    conda "${moduleDir}/environment.yml"
    container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
        'oras://community.wave.seqera.io/library/cellbender:0.3.0--c4addb97ab2d83fe':
        'community.wave.seqera.io/library/cellbender:0.3.0--41318a055fc3aacb' }"


Here the process.container is community.wave.seqera.io/library/cellbender:0.3.0--41318a055fc3aacb where community.wave.seqera.io is the name of the Docker container registry.

However, when we set docker.registry = 12345678900.dkr.ecr.us-east-1.aws.com etc to point to our internal AWS ECR registry, expecting that all containers will be pulled from our internal registry, and that a missing container will error-out (prompting us to review and import the container to our ECR). This is not the case, instead Nextflow ignores docker.registry and pulls the container from the public Wave registry instead.

This is important to meet Infosec compliance requirements in environments that require all containers to be internally hosted.

Similarly, this impacted our CI / CD here as well https://github.com/seqeralabs/nf-canary/issues/32

I was aware that nf-core was moving towards usage of Wave containers, which we had prepared for with our ECR import methodologies, however, I was surprised to see that now nf-core pipelines appear to be using this syntax of hard-coded container registry in the pipeline which breaks the usage of docker.registry which we relied on to ensure we are getting our internal version of the container. @ewels

It seems like the easiest solution would be for Nextflow to allow docker.registry to actually control the Docker registry used, and not get ignored when process.container includes a registry in the URI. If needed I can post some code snippets that show some methods used to programmatically parse the container URI to detect if a registry is used in the URI name (and then you would be able to strip it easier from the container used)

The inability to control where your container is getting pulled from due to docker.registry getting ignored seems like a behavioral oversight of some sort, maybe. What do you think? Thanks.

Environment

  • Nextflow version: 24.10.4
  • Operating system: Linux AWS

stevekm avatar Jun 02 '25 17:06 stevekm

Wouldn't promoting the docker registry over the full container URL be highly error-prone? It is assuming that the custom registry has the exact same containers as the full container URLs.

Also, you don't know in general which part of the full container URL constitutes the "registry". Is it community.wave.seqera.io/library/ or just community.wave.seqera.io? Easy to figure out for specific cases but not in general

bentsherman avatar Jun 02 '25 17:06 bentsherman

The use of docker.registry within nf-core is pretty recent and I don't think that it was ever comprehensive. There have always been exceptions to the quay.io hosting, despite us trying to push in that direction for simplicity.

I think that the current behaviour is sensible for the majority of users (simple defaults, ability to override). We generally need to be able to specify multiple registries, so this is the desired behaviour. The only solution that I can think of is to move docker.registry to process scope so that it can be set at process-specific level. But if you're doing that then you may as well just override process.container. Trying to guess container URI structures is much more difficult to generalise than it might seem as @bentsherman says (I've been bitten by this before).

Note that the Wave "mirror" functionality was introduced to handle exactly the use case that you describe... 👀

Are you able to supply a Nextflow config that simply overrides the process.container statements?

ewels avatar Jun 02 '25 19:06 ewels

Are you able to supply a Nextflow config that simply overrides the process.container statements?

This is what I would recommend as well. You can use nextflow inspect to generate a config, search-and-replace with you desired registry, validate it, and bake it into your pipeline as a config profile. Essentially the build-time equivalent of what you are suggesting.

bentsherman avatar Jun 02 '25 20:06 bentsherman

Wouldn't promoting the docker registry over the full container URL be highly error-prone? It is assuming that the custom registry has the exact same containers as the full container URLs.

Also, you don't know in general which part of the full container URL constitutes the "registry". Is it community.wave.seqera.io/library/ or just community.wave.seqera.io? Easy to figure out for specific cases but not in general

We have a separate process set up (via GitHub Actions) which its expected for users to submit the list of their desired public containers (docker.io, quay.io, Wave, etc) to be imported into the ECR. Then our Nextflow configs for pipeline executions use docker.registry to point back to our ECR. It is expected that if a user did not pre-import the containers into the ECR, the pipelines should fail ; indeed this has been the behavior for some time now due to the fact that most nf-core pipelines also utilized docker.registry and omitted the registry from the process.container. So I was honestly under the impression also that this was a "best practice" and that in general Nextflow processes' container directives should intentionally omit inclusion of the registry in favor of using docker.registry like this to enable portability.

For the simple case I linked of nf-canary pipeline, we were able to override process.container with a supplemental Nextflow config file. But this is not feasible or even possible for most of the rest of nf-core pipelines. Those pipelines are too big, too complicated, and are regularly updated. There is no way anyone could be expected to re-write every nf-core pipeline's nextflow.configs to override all of the process.container directives - and then be responsible for keeping it up to date for every single new git branch and version release of every single nf-core pipeline.

stevekm avatar Jun 02 '25 20:06 stevekm

Note that the Wave "mirror" functionality was introduced to handle exactly the use case that you describe... 👀

we have separate issues with Wave usage which has ongoing solutions in development, but in general Wave is not really a solution for this problem. Because its still pulling in public copies of containers from external locations (and sharing a lot of information in the process). The requirement which we had been satisfying with the combo of external container pre-import + docker.registry is that containers running on company internal AWS infra need to be hosted on (and preferably deployed from) that same internal AWS's ECR.

stevekm avatar Jun 02 '25 20:06 stevekm

Just to clarify, I the proposed solution is that when Nextflow is resolving the final URI of the container to pull, if docker.registry is set, and process.container is set, it should take these basic steps

  • split the given URI apart (process.container)
  • check for known registry names in parts[0]
  • put the library, container, and tag parts back together
  • pre-pend the desired internal registry name (docker.registry)

also I just wanted to point out;

This is what I would recommend as well. You can use nextflow inspect to generate a config, search-and-replace with you desired registry, validate it, and bake it into your pipeline as a config profile. Essentially the build-time equivalent of what you are suggesting.

No one is actually gonna do all this unfortuantely. Its expected that users will bring their own pipelines, and users are expecting that nf-core pipelines will "just work". We already have a non-trivial amount of work that must be done to bring Nextflow, Seqera, and nf-core into Infosec compliance, so putting this kind of burden on the pile would just result in no one using nf-core, and likely reduced Nextflow and Seqera adoption.

I think by far the best course of action would be that Nextflow updates so that docker.registry works the way one would expect it to when you are configuring Nextflow to use a different registry than the one in the process directive

stevekm avatar Jun 03 '25 01:06 stevekm

I think by far the best course of action would be that Nextflow updates so that docker.registry works the way one would expect it to when you are configuring Nextflow to use a different registry than the one in the process directive

It's important to note this is not Nextflow behaviour but the behaviour of your Docker engine. Nextflow is simply passing the URI to the Docker service (in your case, on AWS Batch).

It's generally regarded as a security concern to do replace the docker.registry, because a malicious actor could replace Docker images without your knowledge. There's more detail in this excellent Stackoverflow answer: https://stackoverflow.com/a/67351972.

I believe what you are looking for is a pull through cache but I've never used one myself so I can't guarantee this will solve your problem.

Because its still pulling in public copies of containers from external locations (and sharing a lot of information in the process).

I'm not sure I understand the difference between Wave copying containers and doing it manually, since it will be the same process?

adamrtalbot avatar Jun 03 '25 09:06 adamrtalbot

No one is actually gonna do all this unfortuantely. Its expected that users will bring their own pipelines, and users are expecting that nf-core pipelines will "just work".

I would only use the inspect approach if it is fully automated, which I think you could do here. So if you have some CI action that you take for every Nextflow pipeline, overriding these containers at build-time (and validating them) could just be another step in CI.

bentsherman avatar Jun 03 '25 11:06 bentsherman

I think by far the best course of action would be that Nextflow updates so that docker.registry works the way one would expect it to when you are configuring Nextflow to use a different registry than the one in the process directive

It's important to note this is not Nextflow behaviour but the behaviour of your Docker engine. Nextflow is simply passing the URI to the Docker service (in your case, on AWS Batch).

It's generally regarded as a security concern to do replace the docker.registry, because a malicious actor could replace Docker images without your knowledge. There's more detail in this excellent Stackoverflow answer: https://stackoverflow.com/a/67351972.

I believe what you are looking for is a pull through cache but I've never used one myself so I can't guarantee this will solve your problem.

Because its still pulling in public copies of containers from external locations (and sharing a lot of information in the process).

I'm not sure I understand the difference between Wave copying containers and doing it manually, since it will be the same process?

Thanks for pointing this out. This does not apply to us however. The entire premise of the Infosec requirements here, is intended to deal with this exact problem.

  • we import or build all containers in the internal AWS ECR where it is subject to security scanning, build logging, provenance tracking, qualification & validation etc.
  • when we run workloads (e.g. Nextflow pipelines) we use ONLY the containers that we imported to the ECR which have gone through the described scanning, logging, tracking, etc. process

Its actually the opposite case to what you describe; constantly using the public container e.g. 'community.wave.seqera.io/library/cellbender:0.3.0--41318a055fc3aacb' leaves us subject security concerns that prompted the container internalization & tracking requirements in the first place.

Regardless of the philospohical considerations over security attack surface and such, the requirement which we were satisfying successfully (after 1yr+ of support from Seqera and 1yr+ of work on our end internally to develop these container handling methods specifically for Nextflow and Seqera) was based largely on these docs and the described behavior;

https://www.nextflow.io/docs/latest/reference/config.html#docker

docker.registry The registry from where Docker images are pulled. It should be only used to specify a private registry server. It should NOT include the protocol prefix i.e. http://.

To this end, we had numerous support tickets on this subject and built a large amount of supporting infra, based on the assumption that Nextflow's docker.registry would behave as described to control the internal Docker registry used as per our internal Infosec compliance requirements.

So ultimately, Nextflow is not behaving as described. Its not controlling the container registry. So I think the best course of action is that this should be fixed so that it behaves as described.

No one is actually gonna do all this unfortuantely. Its expected that users will bring their own pipelines, and users are expecting that nf-core pipelines will "just work".

I would only use the inspect approach if it is fully automated, which I think you could do here. So if you have some CI action that you take for every Nextflow pipeline, overriding these containers at build-time (and validating them) could just be another step in CI.

Re-writing the nextflow.config files for nf-core pipelines is outside of our scope. Its also beyond the expectation of any of the users we are trying to onboard who are eager to run nf-core pipelines. And as mentioned there is no possible way anyone on our end could be expected to maintain such configs over time. If docker.registry simply worked the way its described, then the issue would be resolved.

stevekm avatar Jun 03 '25 18:06 stevekm

If docker.registry simply worked the way its described, then the issue would be resolved.

The docker.registry setting was never intended to override absolute container URLs. I had to double-check a few things (e8b708831c43806e2a5758895320d55fc7979b17) but it is clear that this setting is behaving as intended and has been that way for a long time.

Likely wasn't documented because it was implemented so long ago. I will make sure the docs get updated.

But I also think it is the right choice. Forcing a global override on even fully-specified container URLs gives you no way to use containers outside of the global registry. I appreciate that this isn't a problem in your case, but we have to think about all users and their use cases -- the change you are asking for would be a breaking change.

On the other hand, nextflow inspect gives you (and your users) everything you need to override these container URLs at build-time in a way that is tailored to your specific needs. It does not require modifying the pipeline -- you are simply generating a config profile and including it at runtime. This kind of thing is normal for nf-core pipelines because they do not come with bespoke profiles for your environment, so you almost always need to create one yourself.

bentsherman avatar Jun 04 '25 16:06 bentsherman

if changing the behavior of docker.registry to force the desired registry to be used would be a "breaking change" then I think we should just have a new setting like docker.registryOverride = <my_ECR> that can perform this function

I think its important to point out that this "accepted behavior" of docker.registry fundamentally breaks the ability to control how pipelines are executing on our AWS environment. This is a severe Infosec security violation. It puts all of Nextflow in an incompliant security state. I think there is a fundamental disconnect - the assumption that "all users should be able to use the public resources (e.g. publicly hosted containers)" is not appropriate for these enterprise environments (where organizations are paying to use Nextflow and Seqera).

Something to consider; some of the environments that may want to run Nextflow pipelines may even be air-gapped or otherwise have no public internet access to e.g. quay.io, or community.wave.seqera.io, and may instead be required to only use internal self-hosted docker registry. The implication here is that docker.registry would resolve this. But it does not, because the current behavior undermines control of the pipeline execution and resource usage and hard-codes in reliance on external public resources.

So in reality, the behavior here is not actually for "all users". Its only for users who are working in environments and organziations that are able to freely and indiscriminately use public external resources.

The simple solution to this, is to just enable a settings so that the end user can control which registry their containers are being pulled from. If docker.registry does not do this, then it seems like a new setting that does this would be great.

stevekm avatar Jun 05 '25 15:06 stevekm

It is true that the current registry override does not satisfy the needs of a secured environment. But neither does a hard override, not on its own.

It's not just that you need to control the registry -- it's also that users cannot use containers outside your registry. Or put a different way, you have to make sure that your registry provides every container that they need.

For example, I am considering a separate setting for a hard override, something like this:

docker.registryOverrides = [
  'community.wave.seqera.io',
  'quay.io'
]

But this does not solve the whole problem. If you add this setting and try to run a pipeline out-of-the-box, and the pipeline requires some container that you haven't accounted for, the run will fail.

So you still need to scan these container directives in order to import them into your registry, and the most reliable way to do that is through nextflow inspect. Don't rely on your users to tell you what containers they need, and risk them making a mistake or forgetting something -- let the pipeline tell you when they register it. Bake the generated config -- with explicit container overrides -- into their launch process. Eliminate the entire class of runtime errors due to invalid containers.

Well, to be fair, I am assuming that users are "registering" versioned pipelines with you before they use it, and that you already have a CI automation for this that could be augmented to also inspect / pre-import containers. Maybe that is overkill for an R&D environment, in which case I would understand not wanting to add a new layer of red tape just for this.

In that case I think the next best thing would be this list of registry overrides, and at least using nextflow inspect to handle the container pre-import, if not for the override.

bentsherman avatar Jun 05 '25 20:06 bentsherman

Related question -- do you currently have the ability to block public registries, so that the pipeline fails instead of reaching outside of the secured environment?

If not, maybe we should add a flag for this, since NXF_OFFLINE is probably to restrictive. Such a setting might also be broader than just containers, it could also be applied to Nextflow plugins, remote files, HTTP requests, etc

bentsherman avatar Jun 05 '25 22:06 bentsherman

It's not just that you need to control the registry -- it's also that users cannot use containers outside your registry. Or put a different way, you have to make sure that your registry provides every container that they need.

But this does not solve the whole problem. If you add this setting and try to run a pipeline out-of-the-box, and the pipeline requires some container that you haven't accounted for, the run will fail.

This is the exact pattern that we are trying to capture: the onus is on the user to ensure that their pipelines are compliant in our environment (and by extension, the entire organization), prior to launching the pipeline. As you've stated, we have tools for them to accomplish that without our intervention. The intention is specifically to have pipelines fail if they do not meet this prerequisite of compliant containers.

We can likely block some internet access or put the registries on a blacklist, but that creates a disruption for the users. This would require the users to provide the custom configs for each affected nf-core pipeline (as discussed above), but that is not a sustainable or robust solution for our user base or any larger organization.

kchaung-lilly avatar Jun 06 '25 17:06 kchaung-lilly

Okay, then it sounds like we need to add an explicit setting to prevent Nextflow from pulling from outside the private registry.

I am still skeptical that the auto-generated configs can't be done in a user-friendly way, but you guys know your stack better than me

Here is a PR that you can try: #6178

bentsherman avatar Jun 06 '25 19:06 bentsherman

A lighter alternative would be just a flag like docker.forceRegistry = true that simply fails the pipeline if a fully-qualified container is given, without trying to override it. That would require users to override their configs prior to launch, but they will have to do this anyway to generate the list of registry overrides.

bentsherman avatar Jun 06 '25 19:06 bentsherman

What about a docker.whitelist which is an array of registries? Fail early if you try to pull a container not in that set.

adamrtalbot avatar Jun 06 '25 19:06 adamrtalbot

It seems like the easiest solution would be for Nextflow to allow docker.registry to actually control the Docker registry used, and not get ignored when process.container includes a registry in the URI

I think it's a fair request, and likely the best approach to address @stevekm needs. It's enough adding a flag e.g. docker.registryOverrides, to extend the semantics of docker.registry to fully qualified container names.

pditommaso avatar Jun 10 '25 14:06 pditommaso

thanks for the help everyone

Okay, then it sounds like we need to add an explicit setting to prevent Nextflow from pulling from outside the private registry.

I am still skeptical that the auto-generated configs can't be done in a user-friendly way, but you guys know your stack better than me

Here is a PR that you can try: #6178

@bentsherman how would I go about testing this out? For interactive pipeline development, I am typically installing Nextflow from conda. Not sure what the methodology is to checkout and run a PR branch of Nextflow like this.

stevekm avatar Jun 18 '25 15:06 stevekm

Build with make pack as described here and you will get a standalone binary that you can run wherever

bentsherman avatar Jun 18 '25 15:06 bentsherman

Build with make pack as described here and you will get a standalone binary that you can run wherever

thanks @bentsherman but I do not think its possible to do that with Seqera Platform. Is it? We are not able to run Docker containers with Nextflow anywhere except on Seqera Platform + AWS Batch.

stevekm avatar Jun 19 '25 21:06 stevekm

You can override the launch container but that applies to your entire platform instance and requires a restart, so probably not a good option

You can use the pre-run script to inject your PR build and replace the default nextflow version -- that's probably your best option

bentsherman avatar Jun 20 '25 13:06 bentsherman

If you launch with the API there is a launchContainer parameter for specifying an alternative Nextflow launch container: https://docs.seqera.io/platform-api/create-workflow-launch

adamrtalbot avatar Jun 20 '25 13:06 adamrtalbot

It seems like the easiest solution would be for Nextflow to allow docker.registry to actually control the Docker registry used, and not get ignored when process.container includes a registry in the URI. If needed I can post some code snippets that show some methods used to programmatically parse the container URI to detect if a registry is used in the URI name (and then you would be able to strip it easier from the container used)

A solution as been provided here https://github.com/nextflow-io/nextflow/pull/6205

pditommaso avatar Jun 23 '25 07:06 pditommaso

Solved by #6205

pditommaso avatar Jul 06 '25 12:07 pditommaso