tools icon indicating copy to clipboard operation
tools copied to clipboard

Pipeline download purpose of cache-utilization parameters after refactoring

Open JulianFlesch opened this issue 3 months ago • 3 comments

Note: This was commented on main the downloads functionality pull request https://github.com/nf-core/tools/pull/3634

To comment on https://github.com/nf-core/tools/pull/3634#discussion_r2195332147 and https://github.com/nf-core/tools/pull/3634#discussion_r2223736610 , here are my observations from testing combinations of cache/library parameters and options.

  1. The proposed implementation/purpose of -u copy is to create a standalone copy of the pipeline. Thus, its local singularity-images/ directory must be complete, and nextflow.config must be modified to point at it. The pipeline can then run without a cache/library directory externally set. Rightly, the command takes images from the pre-existing cache or library if possible, rather than downloading absolutely everything. The current implementation also copies images that are downloaded to the cache directory. This has no runtime effect but honours the purpose of the cache/library: after the command is run, they together hold all images as well, thus helping future / other invocations of Nextflow. Note that the cache may not hold all images itself, as some may rather be in the library only.
  2. The proposed implementation/purpose of -u amend is to create a copy of the pipeline that entirely relies on the external cache alone. That's why the local singularity-images/ directory is left empty and all images are rather deposited into the cache. nextflow.config is not modified. Here also, the command only downloads the images that are not present in the cache/library. Contrary to -u copy, images are copied to the cache to make it complete on its own, rather than allowing some images to be in the library only.
  3. However I don't understand the implementation/purpose of skipping the -u option. When I run nf-core pipelines download without -u, the local singularity-images/ directory is made complete but nextflow.config isn't updated, meaning singularity-images/ is not used. Fortunately, like -u copy, the cache is updated, so the pipeline can still run offline provided the cache (and the library) are set externally. It's essentially something in between -u copy and -u amend, but I can't define the rationale.

My thinking is that:

  1. In -u amend mode, no need to duplicate images from the library to the cache. In other words, assume that the pipeline will run with both the cache and library set the same way as in nf-core pipelines download, rather than only the cache.
  2. In -u amend mode, no need to create a local singularity-images/ directory if it's left empty.
  3. Don't allow -u to be unset ?

Originally posted by @muffato in https://github.com/nf-core/tools/issues/3634#issuecomment-3193885334

JulianFlesch avatar Aug 27 '25 11:08 JulianFlesch

Hi @muffato thanks for your comments!

There is currently ["amend", "copy", "remote"]

The proposed implementation/purpose of -u amend is to create a copy of the pipeline that entirely relies on the external cache alone. That's why the local singularity-images/ directory is left empty and all images are rather deposited into the cache. nextflow.config is not modified. Here also, the command only downloads the images that are not present in the cache/library. Contrary to -u copy, images are copied to the cache to make it complete on its own, rather than allowing some images to be in the library only.

This should be handled by the env variable NXF_SINGULARITY_CACHEDIR, no? If you set it before running your pipeline, then singularity should check there for images and find them cached 🤔

In -u amend mode, no need to duplicate images from the library to the cache. In other words, assume that the pipeline will run with both the cache and library set the same way as in nf-core pipelines download, rather than only the cache.

Not sure that is being done should only use the NXF_SINGULARITY_CACHEDIR

In -u amend mode, no need to create a local singularity-images/ directory if it's left empty.

Sure, that can be skipped 👍

Don't allow -u to be unset ?

Better yet, set one of them as default, I would suggest "copy" as it would be the most expected behavior imho.

What are your thoughts on this @MatthiasZepper ? Are we missing something?

JulianFlesch avatar Sep 09 '25 08:09 JulianFlesch

Better yet, set one of them as default, I would suggest "copy" as it would be the most expected behavior imho.

Yes, that is a good idea.

Ultimately, the cache utilization is an old parameter, that was initially a Boolean flag to switch between False (copy) and True (amend). Because I needed remote as third option, it was changed into the current form. Therefore, it may be worth considering, if we should replace it entirely with a new and possibly clearer parameter?

Just a quick draft, but something along the lines of --output-style / --output-spec with options like self-reliant, local-cache, remote-cache, local-library-and-cache etc. could work?

MatthiasZepper avatar Sep 10 '25 17:09 MatthiasZepper

The proposed implementation/purpose of -u amend is to create a copy of the pipeline that entirely relies on the external cache alone. That's why the local singularity-images/ directory is left empty and all images are rather deposited into the cache. nextflow.config is not modified. Here also, the command only downloads the images that are not present in the cache/library. Contrary to -u copy, images are copied to the cache to make it complete on its own, rather than allowing some images to be in the library only.

This should be handled by the env variable NXF_SINGULARITY_CACHEDIR, no? If you set it before running your pipeline, then singularity should check there for images and find them cached 🤔

That's not the problem. The problem is that with -u amend, unnecessary copies of the images are being made.

  • -u copy. The end result is singularity-images/ being complete. That's perfect for standalone use and copy to other systems
  • -u amend. The end result is NXF_SINGULARITY_CACHEDIR being complete, even if NXF_SINGULARITY_LIBRARYDIR is set and already contains images. This duplicates images unnecessarily and it is at odds with Nextflow's own behaviour (Nextflow does not copy images from NXF_SINGULARITY_LIBRARYDIR to NXF_SINGULARITY_CACHEDIR).

In -u amend mode, no need to duplicate images from the library to the cache. In other words, assume that the pipeline will run with both the cache and library set the same way as in nf-core pipelines download, rather than only the cache.

Not sure that is being done should only use the NXF_SINGULARITY_CACHEDIR

Why would we assume that the pipeline run will run in an environment where NXF_SINGULARITY_CACHEDIR is set the same way as in nf-core pipelines download but NXF_SINGULARITY_LIBRARYDIR would not ?

Therefore, it may be worth considering, if we should replace it entirely with a new and possibly clearer parameter?

Just a quick draft, but something along the lines of --output-style / --output-spec with options like self-reliant, local-cache, remote-cache, local-library-and-cache etc. could work?

Renaming the parameter may help, but I don't think there are that many options. I think the only two options are:

  1. Make a complete local copy (what -u copy does)
    • -u remote is a special case where the user states that some images are already downloaded and can be skipped.
  2. Populate NXF_SINGULARITY_CACHEDIR (-u amend) so that Nextflow's library+cache are complete.

Don't allow -u to be unset ?

Better yet, set one of them as default, I would suggest "copy" as it would be the most expected behavior imho.

Perfect suggestion 👍🏼

muffato avatar Sep 10 '25 23:09 muffato