BiocManager icon indicating copy to clipboard operation
BiocManager copied to clipboard

add default BioC_mirror for CI=TRUE env

Open LiNk-NY opened this issue 1 year ago • 8 comments

Hi Martin, @mtmorgan Here is a draft PR to re-direct CI=TRUE environments to low-cost egress services via the BioC_mirror option.

cc: @almahmoud @vjcitn

LiNk-NY avatar Mar 04 '24 20:03 LiNk-NY

I think a complete mirror (EDIT: for current release and devel only, previous versions are always routing to an egress-free bucket anyway) is the goal, first by having a constant syncing mechanism, then ideally by having the BBS machine push to all the bucket destinations itself, with the idea of potentially also routing heavy traffic to this mirror after hitting a certain threshold in a certain time period, to avoid excess egress fees from a single source, which is the context in which this came up again.

With that being said, I agree that completely replacing bioconductor.org is probably not the best way to go. The idea was very much inspired by the container-binaries reroute in BiocManager, and doing it similarly sounds great if that can be done for all OSes and source packages. The implementation would be slightly different as it won't be per platform per se, so can't rely on an environment variable identifying the platform directly, so would have to start by detecting CI=TRUE and then still routing based on the OS or from source, as it would on any other unknown system, which I think could probably be done by generating the CI_PLATFORM portion programmatically, if I'm understanding things correctly. If that method can also be done for source packages that would be amazing, and in line with the original rough idea of replicating the container-binaries reroute for an egress-free mirror, to have the mirror supersede where possible while maintaining the ability to default to original bioconductor.org if the mirror is down or not up-to-date yet for that particular package.

^^All of the above with the disclaimer that I am very much a youngling when it comes to understanding BiocManager and the BBS/Bioc ecosystem, so please take all my ideas/opinions with a kg of salt, and I am of course very open to be corrected!

almahmoud avatar Mar 05 '24 01:03 almahmoud

thanks for all the discussion here. I am very concerned about the complexity. I wonder whether biocmanager is becoming overloaded with functions that do not address its main concerns of organizing an installation for a user. it would be good to think about factoring repository- and version- and platform-oriented calculations away from "processing user request to valid installation" calculations. this could be a basis for well-designed tests?

vjcitn avatar Mar 05 '24 02:03 vjcitn

Thanks for the review Martin. I will work on updates.

I think setting a BioC_mirror for CI environments would be sufficient. The mirror would host both source and binary packages for all platforms (which we already produce) including https://ci-mirror.bioconductor.org/packages/X.YY/container-binaries/bioconductor_docker which would be the preferred setup method for CI/CD platforms. It would only be a matter of rsycing the assets over to the new URL.

LiNk-NY avatar Mar 05 '24 15:03 LiNk-NY

I think setting a BioC_mirror for CI environments would be sufficient. The mirror would host both source and binary packages for all platforms (which we already produce) including https://ci-mirror.bioconductor.org/packages/X.YY/container-binaries/bioconductor_docker which would be the preferred setup method for CI/CD platforms. It would only be a matter of rsycing the assets over to the new URL.

Good point. Does the current patch actually redirect binary queries to ci-mirror... or is the binary URL hard-coded? I'm not sure (maybe it does...) that the usual mirroring mechanism (rsync) supports the path to binaries (there's some authentication step involved, and it might restrict what can be synced...). Also not sure that one would want the binary URL to be built on the mirror, since mirrors will often not contain the binary repository...

mtmorgan avatar Mar 05 '24 16:03 mtmorgan

As a first pass, I've set up a tentative configuration for the ci-mirror.bioconductor.org endpoint where /packages/[3.18 | release | 3.19 | devel] get rerouted to the egress-free bucket, while any other path defaults back to bioconductor.org

almahmoud avatar Mar 06 '24 18:03 almahmoud

Good point. Does the current patch actually redirect binary queries to ci-mirror... or is the binary URL hard-coded? I'm not sure (maybe it does...) that the usual mirroring mechanism (rsync) supports the path to binaries (there's some authentication step involved, and it might restrict what can be synced...).

The current patch checks for a valid /container-binaries/ URL. If not found, it will fallback to bioconductor.org.

Also not sure that one would want the binary URL to be built on the mirror, since mirrors will often not contain the binary repository...

Yes, they don't often contain the binary repository but now they can

LiNk-NY avatar Mar 06 '24 18:03 LiNk-NY

Where is .BIOCONDUCTOR_CI_MIRROR defined? Unfortunately .url_exists() still "works" (and returns FALSE) when the variable passed to it isn't defined.

hpages avatar Mar 06 '24 18:03 hpages

Where is .BIOCONDUCTOR_CI_MIRROR defined? Unfortunately .url_exists() still "works" (and returns FALSE) when the variable passed to it isn't defined.

Thanks, that bit of code I just removed.

LiNk-NY avatar Mar 06 '24 18:03 LiNk-NY