BiocManager
BiocManager copied to clipboard
add default BioC_mirror for CI=TRUE env
Hi Martin, @mtmorgan
Here is a draft PR to re-direct CI=TRUE environments to low-cost egress services via the BioC_mirror option.
cc: @almahmoud @vjcitn
I think a complete mirror (EDIT: for current release and devel only, previous versions are always routing to an egress-free bucket anyway) is the goal, first by having a constant syncing mechanism, then ideally by having the BBS machine push to all the bucket destinations itself, with the idea of potentially also routing heavy traffic to this mirror after hitting a certain threshold in a certain time period, to avoid excess egress fees from a single source, which is the context in which this came up again.
With that being said, I agree that completely replacing bioconductor.org is probably not the best way to go. The idea was very much inspired by the container-binaries reroute in BiocManager, and doing it similarly sounds great if that can be done for all OSes and source packages. The implementation would be slightly different as it won't be per platform per se, so can't rely on an environment variable identifying the platform directly, so would have to start by detecting CI=TRUE and then still routing based on the OS or from source, as it would on any other unknown system, which I think could probably be done by generating the CI_PLATFORM portion programmatically, if I'm understanding things correctly.
If that method can also be done for source packages that would be amazing, and in line with the original rough idea of replicating the container-binaries reroute for an egress-free mirror, to have the mirror supersede where possible while maintaining the ability to default to original bioconductor.org if the mirror is down or not up-to-date yet for that particular package.
^^All of the above with the disclaimer that I am very much a youngling when it comes to understanding BiocManager and the BBS/Bioc ecosystem, so please take all my ideas/opinions with a kg of salt, and I am of course very open to be corrected!
thanks for all the discussion here. I am very concerned about the complexity. I wonder whether biocmanager is becoming overloaded with functions that do not address its main concerns of organizing an installation for a user. it would be good to think about factoring repository- and version- and platform-oriented calculations away from "processing user request to valid installation" calculations. this could be a basis for well-designed tests?
Thanks for the review Martin. I will work on updates.
I think setting a BioC_mirror for CI environments would be sufficient. The mirror would host both source and binary packages for all platforms (which we already produce) including https://ci-mirror.bioconductor.org/packages/X.YY/container-binaries/bioconductor_docker which would be the preferred setup method for CI/CD platforms. It would only be a matter of rsycing the assets over to the new URL.
I think setting a
BioC_mirrorfor CI environments would be sufficient. The mirror would host both source and binary packages for all platforms (which we already produce) includinghttps://ci-mirror.bioconductor.org/packages/X.YY/container-binaries/bioconductor_dockerwhich would be the preferred setup method for CI/CD platforms. It would only be a matter of rsycing the assets over to the new URL.
Good point. Does the current patch actually redirect binary queries to ci-mirror... or is the binary URL hard-coded? I'm not sure (maybe it does...) that the usual mirroring mechanism (rsync) supports the path to binaries (there's some authentication step involved, and it might restrict what can be synced...). Also not sure that one would want the binary URL to be built on the mirror, since mirrors will often not contain the binary repository...
As a first pass, I've set up a tentative configuration for the ci-mirror.bioconductor.org endpoint where /packages/[3.18 | release | 3.19 | devel] get rerouted to the egress-free bucket, while any other path defaults back to bioconductor.org
Good point. Does the current patch actually redirect binary queries to ci-mirror... or is the binary URL hard-coded? I'm not sure (maybe it does...) that the usual mirroring mechanism (rsync) supports the path to binaries (there's some authentication step involved, and it might restrict what can be synced...).
The current patch checks for a valid /container-binaries/ URL. If not found, it will fallback to bioconductor.org.
Also not sure that one would want the binary URL to be built on the mirror, since mirrors will often not contain the binary repository...
Yes, they don't often contain the binary repository but now they can
Where is .BIOCONDUCTOR_CI_MIRROR defined? Unfortunately .url_exists() still "works" (and returns FALSE) when the variable passed to it isn't defined.
Where is
.BIOCONDUCTOR_CI_MIRRORdefined? Unfortunately.url_exists()still "works" (and returnsFALSE) when the variable passed to it isn't defined.
Thanks, that bit of code I just removed.