bioconda-utils icon indicating copy to clipboard operation
bioconda-utils copied to clipboard

autobump for GitHub releases fails - presumably because of broken regex

Open nevrome opened this issue 8 months ago • 1 comments

Our bioconda releases of poseidon-trident and poseidon-xerxes fail to autobump, forcing @stschiff, @TCLamnidis and me to open PRs manually upon every release.

I can reproduce the bioconda-utils behaviour with

singularity run https://depot.galaxyproject.org/singularity/bioconda-utils:0.18.1--pyhdfd78af_0 \
bioconda-utils autobump recipes/ config.yml --packages poseidon-trident

which yields:

10:15:28 BIOCONDA INFO Hosters loaded: ['FTPHoster', 'GithubRelease', 'GithubTag', 'GithubReleaseAttachment', 'GithubRepoStore', 'Bioconductor', 'CargoPort', 'SourceForge', 'PyPi', 'Bioarchive', 'CPAN', 'CRAN', 'BitBucketTag', 'BitBucketDownload', 'GitlabTag']
10:15:28 BIOCONDA WARNING Selected 1 packages
10:15:28 BIOCONDA INFO Building Recipe DAG
10:16:13 BIOCONDA INFO Building Recipe DAG: done (8280 nodes, 27330 edges)
10:16:14 BIOCONDA WARNING Graph contains 1 packages (blacklist excluded)
10:16:14 BIOCONDA WARNING Excluding 745 blacklisted recipes
10:16:14 BIOCONDA INFO Loading package lists for ['conda-forge']
10:16:55 BIOCONDA INFO Running pipeline with these steps:
10:16:55 BIOCONDA INFO  1. Exclude recipes disabled via config
10:16:55 BIOCONDA INFO  2. Exclude blacklisted recipes: build-fail-blacklist / 745 recipese/linux: 110MB [00:12, 8.49MB/s]
10:16:55 BIOCONDA INFO  3. Exclude sub-recipes
10:16:55 BIOCONDA INFO  4. Exclude recipes depending on packages in need of update
10:16:55 BIOCONDA INFO  5. Load the recipe from the filesystem
10:16:55 BIOCONDA INFO  6. Exclude recipes in conda-forge
10:16:55 BIOCONDA INFO  7. Bump recipes in need of rebuild after pinning changes
10:16:55 BIOCONDA INFO  8. Scan upstream for new releases and update recipe
10:16:55 BIOCONDA INFO  9. Update source checksums
10:16:55 BIOCONDA INFO  10. Write recipe to filesystem
10:16:57 BIOCONDA INFO Package poseidon-trident=1.3.0.4=h9325052_0 missing!
10:16:57 BIOCONDA INFO poseidon-trident needs rebuild. Bumping buildnumber to 1
10:16:58 BIOCONDA WARNING Finished update
10:16:58 BIOCONDA INFO Unrecognized URL stats:
10:16:58 BIOCONDA INFO
10:16:58 BIOCONDA INFO Recipe status statistics:
10:16:58 BIOCONDA INFO NoRecognizedSourceUrl: 1
10:16:58 BIOCONDA INFO SUM: 1

@rdenise helped me to investigate the source of the issue and why NoRecognizedSourceUrl may be thrown here. I'm trying to summarize our observations, but as I'm not using python please forgive any imprecise or inaccurate terminology and statements.

To my understanding the get_versions function of the UpdateVersion class attempts to parse the URL according to regex found in the Hoster, GithubBase and GithubRelease classes:

https://github.com/bioconda/bioconda-utils/blob/master/bioconda_utils/hosters.py#L142C1-L145C80

https://github.com/bioconda/bioconda-utils/blob/master/bioconda_utils/hosters.py#L342C7-L353

https://github.com/bioconda/bioconda-utils/blob/master/bioconda_utils/hosters.py#L358C1-L360C89

We tried to test these regex with the source URLs for poseidon-trident (https://github.com/poseidon-framework/poseidon-hs/releases/download/v{{ version }}/trident-conda-linux) in the recipe here and failed to match the URL. Please see the following experiments in a test script:

import re

version = r"(?:(?<=[/._-])[rv])?(?P<version>\d[\da-zA-Z\-+\.:\~_]{0,30})"
ext = r"(?P<ext>(?i)\.(?:(?:(tar\.|t)(?:xz|bz2|gz))|zip|jar))"
account = r"(?P<account>[-\w]+)"
project = r"(?P<project>[-.\w]+)"
prefix = r"(?P<prefix>[-_./\w]+?)"
tag = r"{prefix}??{version}"
fname = r"(?P<fname>[^/]+)"
link = r"/{account}/{project}/releases/download/{tag}/{fname}{ext}?"

version_re = re.compile(version)
prefix_re = re.compile(prefix)
tag_re = re.compile(tag)
link_re = re.compile(link)

print(re.search(prefix_re, "v1.3.0.4"))
print(re.search(version_re, "v1.3.0.4"))
print(re.search(tag_re, "v1.3.0.4"))

print(re.search(link_re, "https://github.com/poseidon-framework/poseidon-hs/releases/download/v1.3.0.4/trident-conda-linux"))

The search for prefix and version work independently, but the one for tag fails. And I assume as a consequence also the one for link. Note that it does work, if the r-strings for tag and link are replaced by f-strings.

Could you please check if this is indeed a bug in the bioconda-utils code or whether our source URL is somehow ill-formatted?

nevrome avatar Nov 03 '23 10:11 nevrome