rBLAST
rBLAST copied to clipboard
Getting taxonomic information from a standard blast database
Hi
Great package. I am trying to get taxonomic information out of the BLAST call. I have added a custom format argument of '6 staxids'
.
I am running blast 2.10.10. I downloaded the Microbial16S database today and the files taxdb.btd
and taxdb.bti
are already in the database, so I was assuming (perhaps naively) it should work out the box.
My normal BLAST searches work using predict()
but including the custom format argument gives the warning Warning: [blastn] Taxonomy name lookup from taxid requires installation of taxdb database with ftp://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz
and the command errors.
Any pointers much appreciated.
Hi,
I have not tried this. I probably need to change something in the R code to make this work. Not quite sure what. MAybe blastn needs an argument to know where to find the taxdb file. Please let me know if you figure this out so I can add it to the package.
Thanks, Michael
Hi
I have a similar question but in my case I fail to call blast function. I downloaded the database from NCBI, and I created a folder where I put all the unzipped files but when I try to load the db into R using blast function,
bl <- blast(db = "16S_ribosomal_RNADB/taxdb.btd")
it fails saying
Error: Executable for blastn not found! Please make sure that the software is correctly installed and, if necessary, path variables are set.
The path doesnt seem to be the problem, since I get a different error when I point to a non-existing folder.
What am I missing? I am running rBLAST_0.99.2
I got BLAST+ ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST//ncbi-blast-2.11.0+-win64.exe
But I am not sure R is actually seeing it
What does Sys.which("blastn")
say?
First of all, rBLAST is really a fantastic package!!! I have used it to develop several functions and I hope the developers can allow me to share all these functions and create them into new packages.
For this question ( I guess I am not the only one who wants to solve it) I found a temporary solution, hope it can help. the custom_format (blast output formate) has a hidden factor called "stitle", which can also show the organism names. stitle means Subject Title But, very interesting, this factor needs a blank column before it, then it can work. (I guess because it have a ",",as in source code, the output formate is 10, will sep ",") my code as below: Species<-blast("C:/Users/user/Documents/Database/16S/16S_ribosomal_RNA") A<-predict(Species,data1,BLAST_args = " -max_target_seqs 1 -perc_identity 95 ",custom_format = "qseqid sseqid pident length slen mismatch gaps evalue \t stitle")
Do notice there must be an empty column space \t space before stitle!!
Add to the previous answer, if you are working with bacteria ID download the .fna file and makeblastdb by yourself would be more convenient. Link as below: ftp://ftp.ncbi.nlm.nih.gov/refseq/TargetedLoci/Bacteria/bacteria.16SrRNA.fna.gz. still use the \t stitle.
Hi @yzhong005: Thanks for sharing this! It would be great if you make a pull request for your functions.
Hi @yzhong005: Thanks for sharing this! It would be great if you make a pull request for your functions.
I have just put the new pull request called Species ID. Can feel free to edit it. Thank U!
Thankyou for sharing! I was looking for this solution.
Thank you. Yes, it seems stitle is not read correctly. I have changed the code, so it works now without the extra "\t".