diamond icon indicating copy to clipboard operation
diamond copied to clipboard

difference between qcov (ncbi blast) and qcovhsp (diamond blast)

Open terancehhwong opened this issue 2 years ago • 9 comments

Hi there,

Sorry if it might be a stupid question. I just wonder what is the difference between qcovs from NCBI blast and qcovhsp from diamond blast? What I am doing now basically is to blastp of transdecoder-predicted protein coding sequences derived from trinity.fasta against ncbi diamond nr database, and specifically targeting sequences with query coverage (known as qcov from ncbi blast) of at least 50%. What parameter should i set if i am using diamond blast for that purpose? and will qcovhsp give a similar percentage as qcov?

Also,if i would like to filter out results whose percent identity (pident) was lower than 50%, is there a way to set the parameter, e.g, pident -50?

thanks a lot!

Best regards, Terance

terancehhwong avatar Nov 27 '23 06:11 terancehhwong

qcovs and qcovhsp are the same most of the time. When there are multiple HSPs in the same subject, qcovs is the combined coverage while qcovhsp is the coverage of the individual HSP. Diamond only supports qcovhsp. You can set a sequence identity filter using --id.

bbuchfink avatar Nov 27 '23 10:11 bbuchfink

I see, thanks! Moreover, I also wanna ask if diamond blastp can be used to blast against other common databases, such as trinotate, KEGG etc just like NCBI blastp tool works well for other databases as well? If so, would the command and parameters be the same as blasting against diamond nr database? thanks again!

terancehhwong avatar Nov 27 '23 15:11 terancehhwong

Sure, you can use any database with the same command line.

bbuchfink avatar Nov 27 '23 15:11 bbuchfink

oh wait...when i tried to use diamond blast to blast against a previously downloaded uniprot database, it says Opening the database... Error: This executable was not compiled with support for BLAST databases. does that mean i need to do some tuning before using diamond blast? Is diamond makedb --taxonmap prot.accession2taxid.FULL.gz --taxonnodes nodes.dmp --taxonnames names.dmp --in uniprot_sprot.fasta --db uniprotDB the sample code for that? and do u know if it would be alright to use ncbi nr protein accession id, taxonnodes and taxonmap info for making uniprot_sprot db?as it seems i just cant find the uniprot taxon info from their website... Thanks again!

PS:i am using the most updated version of diamond

terancehhwong avatar Nov 28 '23 05:11 terancehhwong

This executable was not compiled with support for BLAST databases

means you have a BLAST database of that name in your directory

diamond makedb --taxonmap prot.accession2taxid.FULL.gz --taxonnodes nodes.dmp --taxonnames names.dmp --in uniprot_sprot.fasta --db uniprotDB

You can build a database from a fasta file like this. The NCBI mapping files may work for this swissprot fasta file, but I'm not sure.

bbuchfink avatar Nov 29 '23 16:11 bbuchfink

I see. I also have a separate question: can we set the value for the length of the peptide when we perform blastp?Thanks

terancehhwong avatar Dec 13 '23 10:12 terancehhwong

I'm not sure what you mean, the lengths of the sequences are stored in the input fasta files.

bbuchfink avatar Dec 13 '23 14:12 bbuchfink

Oh I meant the alignment length. For example, for percent identity, u could set a certain threshold (with the parameter --id, eg --id 50) before performing blasting to only obtain the resulting unigenes that have at least 50percent identity with the sequences from the database in the blast output file (so by default, sequences with less than 50prercent identity will not be included in the output file). Therefore similarly, i wonder, if there are any parameters can be used to set the alignment length (ie the "length" listed in one of the 12preconfigured fields) to be above a certain threshold? Sorry for the confusion and thanks again

terancehhwong avatar Dec 14 '23 06:12 terancehhwong

No there is no such setting.

bbuchfink avatar Dec 19 '23 14:12 bbuchfink