biomartr icon indicating copy to clipboard operation
biomartr copied to clipboard

Support genome retrieval for misclassified species

Open lutteropp opened this issue 7 years ago • 2 comments

Hi,

I encountered a weird behavior: is.genome.available("Candida glabrata", db = "genbank") returns TRUE, but getGenome(db = "genbank", organism = "Candida glabrata") claims that "No reference genome or representative genome was found for 'Candida glabrata'. Thus, download for this species has been omitted.".

I checked manually in the database and found the entry: https://www.ncbi.nlm.nih.gov/genome/?term=candida+glabrata

The following works: getGenome(db = "genbank", organism = "[Candida] glabrata")

It would be nice to directly download the genome without having to use these brackets, because when downloading a genome it probably does not matter whether a species has been misclassified (see https://support.ncbi.nlm.nih.gov/link/portal/28045/28049/Article/1473/Why-do-I-see-square-brackets-around-some-organism-names-in-the-Taxonomy-database) or not.

Best, Sarah

lutteropp avatar Dec 18 '17 15:12 lutteropp

Hi Sarah,

thank you so much for making me aware of this issue!

You can also use the new argument reference = FALSE to download the genome using only Candida glabrata and not [Candida] glabrata.

getGenome(db = "genbank", organism = "Candida glabrata", reference = FALSE)

The reason why is.genome.available() shows TRUE is because it checks for the availability for all genomes, not only reference genomes. I will try to make this more clear in the documentation and maybe I also include a message when calling is.genome.available().

Do you know other cases where these name confusions can happen? I know that in bacteria names they sometimes use () and I take care of that one. I will now also include []. Do you know any other cases?

I truly appreciate your help and hope that I can make biomartr even more useful over time :-)

Kind regards, Hajk

HajkD avatar Dec 20 '17 13:12 HajkD

Hi Hajk,

you're welcome. :-) I do not know about other cases where these name confusions can happen - I'm just a computer scientist (only recently started with bioinformatics) and this was my first time trying to download lots of genome data. Luckily, I stumbled upon your very nice tool. Keep up the great work!

Best regards, Sarah

lutteropp avatar Dec 22 '17 12:12 lutteropp

I assume this is solved now.

HajkD avatar Sep 27 '23 13:09 HajkD