RepeatMasker
RepeatMasker copied to clipboard
"species not known" for some ambiguous species names
Hi, I have the same issue "species not known"; though it works for 'human', but not for the 'drosophila' and 'anopheles'. Manual installation Repbase is not installed Dfam release 3.4 ./RepeatMasker -engine wublast -s drosophila my.fa ./RepeatMasker -engine wublast -s Drosophila my.fa
RepeatMasker version 4.1.2-p1
Search Engine: ABBlast/WUBlast [ 3.0 ]
Using Master RepeatMasker Database: /mycomputer/RepeatMasker/Libraries/RepeatMaskerLib.h5
Title : Dfam
Version : 3.4
Date : 2021-07-21
Families : 281,951
Species "drosophila" is not known to RepeatMasker. There may
not be any TE families defined in the libraries for this
species/clade or there may be an error in the spelling.
Please check your entry against the NCBI Taxonomy database
and/or try using a broader clade or related species instead.
The full list of species/clades defined in the library may be
obtained using the famdb.py script.
Originally posted by @RadPa in https://github.com/rmhubley/RepeatMasker/issues/122#issuecomment-895757308
It looks like this is because there are multiple taxa with the same name (Drosophila is both a genus and a subgenus; Anopheles is a genus, subgenus, and series). RepeatMasker used to handle this fine, but apparently not now. To work around this problem, you can use one of these more precise names with -species
: drosophila_flies_genus
, anopheles_genus
.
In past versions, RepeatMasker used a built-in list of special species names, including both "drosophila" and "anopheles", to make sure those were always interpreted correctly. However, for this particular step it looks like maybe that list isn't being used anymore. @rmhubley do you know why this is going wrong? I remember updating Taxonomy.py
to make sure it handled the synonyms whenever it invoked famdb.py
, but maybe something else has changed and it isn't referring to them anymore some time when it should?
Worked, Thank you.