snakemake-wrappers
snakemake-wrappers copied to clipboard
ENSEMBL-SEQUENCE does not work for all species
Snakemake version Snakemake: 8.15.2 Wrapper: "v3.13.6/bio/reference/ensembl-sequence"
Describe the bug The path for downloading has a hard-coded structure in the wrapper:
spec = ("{build}" if int(release) > 75 else "{build}.{release}").format(
build=build, release=release
)
url_prefix = f"{url}/{branch}release-{release}/fasta/{species}/{datatype}/{species.capitalize()}.{spec}"
This uses a hard check for > 75
. However, for some species, the path structure differs, for instance A. thaliana is currently in plants release 59, but does not have the above hard-coded extra release
number in the spec
part of the filename.
The correct file name is
Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.gz
but instead the wrapper is only checking for
Arabidopsis_thaliana.TAIR10.59.[dna.primary_assembly.fa.gz|dna.toplevel.fa.gz]
which has the additional 59
that should not be there. Hence, the download fails. I think a simple fix is to avoid the hard-coded 75
, and instead check both variants of the path.