abricate
abricate copied to clipboard
Enable protein query with tblastn
Hi, is it feasable to use abricate for mass screening of protein sequences instead of genes (i.e. use tblastn instead of blastn)? Thanks, Carlus
It's a good question @crarlus - tblastn
could just be a drop in replacement, but there is lots of business logic that assumes DNA coordinates etc.
What database do you use that is protein only?
My collaborator has a curated list of proteins collected from various resources. Of course I tried (hard) to map them back to the original gene sequences, e.g. via a blast to uniprot and uniparc databases. However I could recover only some but not all of the sequences. So it might be an odd case but at the same time the data reality we live in.
I have started adding tblastn
support but it is extremely slow due to the way I am using the genome as they query... it's not committed yet.
pipe it with prokka and you can use blastp after prediction. I mean, a brand new version, ABRICATE+ (including aminoacids databases)
Prokka relies on Prodigal to detect genes/ORFs, and often misses broken genes, or false frameshifted genes due to bad homopolymer issues with 454/ION/Pacbio/Minion assemblies.
Hi, thank you so much for sharing ABRicate, @tseemann. I just want to second the suggestion to improve the protein database option. I am currently using abricate with a protein database of Pfam families (ca. 13000 protein sequences) to screen putative plasmids for replication protein-sequences. It does take a very long time :) I'm sure you are aware of e.g. Diamond https://github.com/bbuchfink/diamond, that should work much faster than blast.
@thysd Diamond only provides blastp
(Prot:Prot) and blastx
(Prot:DNA)
Unfortunately the design of abricate needs the query to be the contigs, i need tblastn
(DNA:Prot)
I don't think Abricate is the best tool for what you want to do. Just running BLAST or MMSeqs2 or DIAMOND directly would make more sense.