tools icon indicating copy to clipboard operation
tools copied to clipboard

Container cache download

Open muffato opened this issue 1 year ago • 2 comments

Fixes #3019 and #3162.

First #3162. The code has to deal with an out_path (always defined) and sometimes a cache_path too. The implementation was slightly convoluted as it was doing fresh downloads into the cache_path or out_path (first decision point) and then doing an extra copy if needed (second decision point). Because of that confusion, symlinks across container registries were not all created across both locations. I propose to reverse the logic to make it more straightforward:

  1. Always download into out_path and create its symlinks.
  2. Then, optionally copy to cache_path (and create symlinks there).

Then, #3019: I propose to handle the singularity "library" directory this way:

  1. Just like in Nextflow itself, it's considered a read-only location. It means that containers can only be copied from it, not to it, and that we shouldn't be even creating symlinks there.
  2. There is no point in having a --container-library-utilisation parameter for the library because i) remote would be redundant with the --container-cache-utilisation's remote mode, ii) amend is not possible as per the read-only rule, so iii) copy is the only possible mode.
  3. Therefore, the most natural place to use the library is as a source of containers, in parallel of https downloads and singularity pull. When NXF_SINGULARITY_LIBRARYDIR is set and the container exists in the library, it is copied to the target directories (out_path and possibly cache_path too)

PR checklist

  • [x] This comment contains a description of changes (with reason)
  • [ ] CHANGELOG.md is updated
  • [ ] Unit-tests
  • [ ] Documentation in README.md

muffato avatar Sep 07 '24 12:09 muffato

Codecov Report

Attention: Patch coverage is 0% with 16 lines in your changes missing coverage. Please review.

Project coverage is 75.52%. Comparing base (8e47a33) to head (78650af).

Files with missing lines Patch % Lines
nf_core/pipelines/download.py 0.00% 16 Missing :warning:
Additional details and impacted files

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Sep 07 '24 12:09 codecov[bot]

The code is there, but I still need to add unit-tests and documentation

muffato avatar Sep 10 '24 21:09 muffato

@muffato - do you think you'll have time to take a look at this PR in the near(ish) future? It'd be great to get it moving and merged if possible..

ewels avatar Nov 21 '24 09:11 ewels

Hi @ewels . Unfortunately there's too much work left to do on this PR. I can't see myself getting to the bottom of it any time soon.

muffato avatar Nov 26 '24 11:11 muffato

No problem - hopefully @mirpedrol can take it over then when she has time, if that's ok 🙏🏻

ewels avatar Nov 28 '24 08:11 ewels

No problem - hopefully @mirpedrol can take it over then when she has time, if that's ok 🙏🏻

Frankly, I consider this feature expendable. It is nice to save some download bandwidth and storage space, but the poor Seqera Containers support is haunting me much more (and also wasting significant storage).

Considering that Júlia has the most extensive knowledge of the nf-core tools codebase, I would rather suggest / entreat / beg her for working on tooling around the container.yml creation, linting and release logic. As soon as each new pipeline release ships with it, we can adopt it on the download side as well.

MatthiasZepper avatar Nov 28 '24 11:11 MatthiasZepper