[24.2] Add option to disable auto_install in cached_explicit_singularity
Without this option there's no way to prevent writing to the cache and falling through to other resolvers if the image is not there. This will allow preconverting explicit docker requirements on CVMFS.
xref #15717
How to test the changes?
(Select all options that apply)
- [ ] I've included appropriate automated tests.
- [x] This is a refactoring of components with existing test coverage.
- [ ] Instructions for manual testing are as follows:
- [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]
License
- [x] I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.
Would love to see the docs and tests extended that have been started here https://github.com/galaxyproject/galaxy/pull/15614
Also a bit reluctant to count this as a bug fix.
All these inconsistencies in the implementation of (mulled and explicit) container resolvers make my head spin.
For mulled:
mulled_singularitydo the caching and return cached descriptioncached_mulled_singularityreturn cached description if available
For explicit
cached_explicit_singularitydo the caching and return cached descriptionexplicit_singularityjust get the container description (only "translating" explicit docker container descriptions)
Thus:
- One may consider
mulled_singularityandcached_explicit_singularityas analogous. - What is actually missing is the analog of
cached_mulled_singularityfor explicit containers, i.e. a resolver that just returns the cached image path if available. - No idea if
explicit_singularityis really useful.
Or?
Thank you for the explainer, I have also struggled to understand the purpose and function of all the resolvers, even with your documentation improvements.
Also worth noting: The difference between mulled vs. explicit is that mulled generates a container ID based on the mulling of <requirement type="package"> tags, whereas explicit only resolves <container> tags. Thus most configs probably want one each of a mulled and explicit resolver. I can add this distinction to the sample conf.
I use explicit_singularity for all my tools with explicit container requirements, this causes a command line of singularity exec ... docker://{container_id} and thus conversion is done automatically per-job. This is particularly important for Pulsar, because if I used cached_explicit_singularity, then the image would be cached on the Galaxy server and the path to that cached image inserted into the command line, but the image (and path) would be nonexistent/invalid in the Pulsar context.
The downside is of course costly image conversions on each run of that tool, which is actually what I am trying to address here. My goal is to identify all tools with explicit container requirements, preconvert all the images to singularity (out-of-band, outside Galaxy), and store them on CVMFS so they can be read by the cached_explicit_singularity resolver. And this PR is attempting to create a cached_explicit_singularity as you describe: a resolver that just returns the cached image path if available.
That said, the resolvers with analogous names should 100% operate the same as eachother in regards to pulling, caching, and the options that they take, to do otherwise is incredibly confusing and unintuitive.
Worth noting, here is my default container resolvers config, as you can see it's all precached images, no building or pulling, other than explicit_singularity. So what I want to add is cached_explicit_singularity that works the same as cached_mulled_singularity.
Worth noting, here is my default container resolvers config, as you can see it's all precached images, no building or pulling, other than
explicit_singularity. So what I want to add iscached_explicit_singularitythat works the same ascached_mulled_singularity.
How do you install cached mulled images at the moment?
In my instance my global container resolver config is:
- type: cached_explicit_singularity
cache_directory: /data/galaxy_server/galaxy/database/container_cache/singularity
- type: cached_mulled_singularity
cache_directory: /data/galaxy_server/galaxy/database/container_cache/singularity
- type: mulled_singularity
auto_install: False
cache_directory: /data/galaxy_server/galaxy/database/container_cache/singularity
That is I can install images via the API / Admin UI
And on my destinations/environments I use:
- type: cached_explicit_singularity
cache_directory: /data/galaxy_server/galaxy/database/container_cache/singularity
- type: cached_mulled_singularity
cache_directory: /data/galaxy_server/galaxy/database/container_cache/singularity
That is only cached images are used.
I then install containers with a bioblend script running in a cron job: https://github.com/bernt-matthias/ufz-galaxy-scripts/blob/1d998d9e4b0ae346c0e6793ec5b767279850a570/container/install_container.py#L124C39-L124C54
The per destination container resolvers are not-so-old-feature https://github.com/galaxyproject/galaxy/pull/15884 but probably you use this already.
That said, the resolvers with analogous names should 100% operate the same as each other in regards to pulling, caching, and the options that they take, to do otherwise is incredibly confusing and unintuitive.
Sure. Maybe we can define aliases for the existing classes and add new ones. Certainly a non trivial task.
How do you install cached mulled images at the moment?
I am using biocontainers directly, meaning that single-package containers are built by bioconda, mulled containers are built by planemo-monitor and singularity-build-bot makes sure that all docker images in the biocontainers org on quay.io are converted to singularity and mirrored to depot.galaxyproject.org/singularity. Depot images are then mirrored to /cvmfs/singularity.galaxyproject.org/all, which is my cached_mulled_singularity dir.
That just leaves the very small number of images manually placed in /cvmfs/main.galaxyproject.org/singularity, most of which are just direct docker conversions (e.g. apptainer build foo.sif docker://foo), but in the unlikely event I need a mulled container I do it by hand.
And on my destinations/environments I use:
In your case cached_explicit_singularity will still try to pull/cache and fail if it's not there, correct? Which is fine as long as you have precached everything, but in my case I want it to fall through to explicit_singularity so that if I haven't cached it myself yet, it can still be converted from docker at runtime.
The per destination container resolvers are not-so-old-feature https://github.com/galaxyproject/galaxy/pull/15884 but probably you use this already.
Yes! Thank you. I use these e.g. for GxITs.
Converted to draft for the moment, I have dug deeper into container resolution changes and will work my way back up to this.
Just to record my exchange with @natefoo here, I am on board with this change (minus if (self.auto_install or install) and not os.path.exists(cache_path):, that should be an unreachable condition after the changes here ... no harm though).
The resolver test makes the wrong assertions, both the docker-based and singularity based toolbox GET should not result in cached images, I think this would need to be changed in the test.
The resolver test makes the wrong assertions,
The test should depict the current state of the implementation. Is the test wrong, or the implementation?
What Nate is doing here is correct and fixing a bug. So the test assumption seems to be wrong but I haven't wrapped my head around exactly what the test is testing.
@mvdbeek what would you like to see done here? It'd be great to get this finished up and merged.
Fix the test is all
I don't think I know what the correct way to do that is.