abricate icon indicating copy to clipboard operation
abricate copied to clipboard

Enable protein query with tblastn

Open crarlus opened this issue 6 years ago • 7 comments

Hi, is it feasable to use abricate for mass screening of protein sequences instead of genes (i.e. use tblastn instead of blastn)? Thanks, Carlus

crarlus avatar Oct 24 '17 06:10 crarlus

It's a good question @crarlus - tblastn could just be a drop in replacement, but there is lots of business logic that assumes DNA coordinates etc.

What database do you use that is protein only?

tseemann avatar Mar 18 '18 00:03 tseemann

My collaborator has a curated list of proteins collected from various resources. Of course I tried (hard) to map them back to the original gene sequences, e.g. via a blast to uniprot and uniparc databases. However I could recover only some but not all of the sequences. So it might be an odd case but at the same time the data reality we live in.

crarlus avatar Mar 20 '18 14:03 crarlus

I have started adding tblastn support but it is extremely slow due to the way I am using the genome as they query... it's not committed yet.

tseemann avatar Apr 07 '18 05:04 tseemann

pipe it with prokka and you can use blastp after prediction. I mean, a brand new version, ABRICATE+ (including aminoacids databases)

felipelira avatar Apr 12 '18 16:04 felipelira

Prokka relies on Prodigal to detect genes/ORFs, and often misses broken genes, or false frameshifted genes due to bad homopolymer issues with 454/ION/Pacbio/Minion assemblies.

tseemann avatar Apr 25 '18 06:04 tseemann

Hi, thank you so much for sharing ABRicate, @tseemann. I just want to second the suggestion to improve the protein database option. I am currently using abricate with a protein database of Pfam families (ca. 13000 protein sequences) to screen putative plasmids for replication protein-sequences. It does take a very long time :) I'm sure you are aware of e.g. Diamond https://github.com/bbuchfink/diamond, that should work much faster than blast.

thsyd avatar Mar 13 '19 07:03 thsyd

@thysd Diamond only provides blastp (Prot:Prot) and blastx (Prot:DNA) Unfortunately the design of abricate needs the query to be the contigs, i need tblastn (DNA:Prot)

I don't think Abricate is the best tool for what you want to do. Just running BLAST or MMSeqs2 or DIAMOND directly would make more sense.

tseemann avatar Oct 06 '19 22:10 tseemann