rBLAST icon indicating copy to clipboard operation
rBLAST copied to clipboard

Getting taxonomic information from a standard blast database

Open padpadpadpad opened this issue 4 years ago • 7 comments

Hi

Great package. I am trying to get taxonomic information out of the BLAST call. I have added a custom format argument of '6 staxids'.

I am running blast 2.10.10. I downloaded the Microbial16S database today and the files taxdb.btd and taxdb.bti are already in the database, so I was assuming (perhaps naively) it should work out the box.

My normal BLAST searches work using predict() but including the custom format argument gives the warning Warning: [blastn] Taxonomy name lookup from taxid requires installation of taxdb database with ftp://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz and the command errors.

Any pointers much appreciated.

padpadpadpad avatar Dec 23 '19 20:12 padpadpadpad

Hi,

I have not tried this. I probably need to change something in the R code to make this work. Not quite sure what. MAybe blastn needs an argument to know where to find the taxdb file. Please let me know if you figure this out so I can add it to the package.

Thanks, Michael

mhahsler avatar Jan 02 '20 18:01 mhahsler

Hi

I have a similar question but in my case I fail to call blast function. I downloaded the database from NCBI, and I created a folder where I put all the unzipped files but when I try to load the db into R using blast function,

bl <- blast(db = "16S_ribosomal_RNADB/taxdb.btd")

it fails saying

Error: Executable for blastn not found! Please make sure that the software is correctly installed and, if necessary, path variables are set.

The path doesnt seem to be the problem, since I get a different error when I point to a non-existing folder.

What am I missing? I am running rBLAST_0.99.2

I got BLAST+ ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST//ncbi-blast-2.11.0+-win64.exe

But I am not sure R is actually seeing it

demar01 avatar Dec 11 '20 15:12 demar01

What does Sys.which("blastn") say?

mhahsler avatar Dec 12 '20 00:12 mhahsler

First of all, rBLAST is really a fantastic package!!! I have used it to develop several functions and I hope the developers can allow me to share all these functions and create them into new packages.

For this question ( I guess I am not the only one who wants to solve it) I found a temporary solution, hope it can help. the custom_format (blast output formate) has a hidden factor called "stitle", which can also show the organism names. stitle means Subject Title But, very interesting, this factor needs a blank column before it, then it can work. (I guess because it have a ",",as in source code, the output formate is 10, will sep ",") my code as below: Species<-blast("C:/Users/user/Documents/Database/16S/16S_ribosomal_RNA") A<-predict(Species,data1,BLAST_args = " -max_target_seqs 1 -perc_identity 95 ",custom_format = "qseqid sseqid pident length slen mismatch gaps evalue \t stitle")

Do notice there must be an empty column space \t space before stitle!!

yzhong005 avatar Oct 28 '21 03:10 yzhong005

Add to the previous answer, if you are working with bacteria ID download the .fna file and makeblastdb by yourself would be more convenient. Link as below: ftp://ftp.ncbi.nlm.nih.gov/refseq/TargetedLoci/Bacteria/bacteria.16SrRNA.fna.gz. still use the \t stitle.

yzhong005 avatar Oct 28 '21 10:10 yzhong005

Hi @yzhong005: Thanks for sharing this! It would be great if you make a pull request for your functions.

mhahsler avatar Oct 28 '21 16:10 mhahsler

Hi @yzhong005: Thanks for sharing this! It would be great if you make a pull request for your functions.

I have just put the new pull request called Species ID. Can feel free to edit it. Thank U!

yzhong005 avatar Oct 29 '21 06:10 yzhong005

Thankyou for sharing! I was looking for this solution.

babinecha avatar Dec 19 '22 14:12 babinecha

Thank you. Yes, it seems stitle is not read correctly. I have changed the code, so it works now without the extra "\t".

mhahsler avatar Dec 19 '22 16:12 mhahsler