biomartr
biomartr copied to clipboard
Support genome retrieval for misclassified species
Hi,
I encountered a weird behavior:
is.genome.available("Candida glabrata", db = "genbank")
returns TRUE, but getGenome(db = "genbank", organism = "Candida glabrata")
claims that "No reference genome or representative genome was found for 'Candida glabrata'. Thus, download for this species has been omitted.".
I checked manually in the database and found the entry: https://www.ncbi.nlm.nih.gov/genome/?term=candida+glabrata
The following works:
getGenome(db = "genbank", organism = "[Candida] glabrata")
It would be nice to directly download the genome without having to use these brackets, because when downloading a genome it probably does not matter whether a species has been misclassified (see https://support.ncbi.nlm.nih.gov/link/portal/28045/28049/Article/1473/Why-do-I-see-square-brackets-around-some-organism-names-in-the-Taxonomy-database) or not.
Best, Sarah
Hi Sarah,
thank you so much for making me aware of this issue!
You can also use the new argument reference = FALSE
to download the genome using only Candida glabrata
and not [Candida] glabrata
.
getGenome(db = "genbank", organism = "Candida glabrata", reference = FALSE)
The reason why is.genome.available()
shows TRUE
is because it checks for the availability for all genomes, not only reference genomes. I will try to make this more clear in the documentation and maybe I also include a message when calling is.genome.available()
.
Do you know other cases where these name confusions can happen?
I know that in bacteria names they sometimes use ()
and I take care of that one.
I will now also include []
. Do you know any other cases?
I truly appreciate your help and hope that I can make biomartr
even more useful over time :-)
Kind regards, Hajk
Hi Hajk,
you're welcome. :-) I do not know about other cases where these name confusions can happen - I'm just a computer scientist (only recently started with bioinformatics) and this was my first time trying to download lots of genome data. Luckily, I stumbled upon your very nice tool. Keep up the great work!
Best regards, Sarah
I assume this is solved now.