diamond difference between qcov (ncbi blast) and qcovhsp (diamond blast)

Hi there,

Sorry if it might be a stupid question. I just wonder what is the difference between qcovs from NCBI blast and qcovhsp from diamond blast? What I am doing now basically is to blastp of transdecoder-predicted protein coding sequences derived from trinity.fasta against ncbi diamond nr database, and specifically targeting sequences with query coverage (known as qcov from ncbi blast) of at least 50%. What parameter should i set if i am using diamond blast for that purpose? and will qcovhsp give a similar percentage as qcov?

Also,if i would like to filter out results whose percent identity (pident) was lower than 50%, is there a way to set the parameter, e.g, pident -50?

thanks a lot!

Best regards, Terance

Nov 27 '23 06:11 terancehhwong

qcovs and qcovhsp are the same most of the time. When there are multiple HSPs in the same subject, qcovs is the combined coverage while qcovhsp is the coverage of the individual HSP. Diamond only supports qcovhsp. You can set a sequence identity filter using --id.

Nov 27 '23 10:11 bbuchfink

I see, thanks! Moreover, I also wanna ask if diamond blastp can be used to blast against other common databases, such as trinotate, KEGG etc just like NCBI blastp tool works well for other databases as well? If so, would the command and parameters be the same as blasting against diamond nr database? thanks again!

Nov 27 '23 15:11 terancehhwong

Sure, you can use any database with the same command line.

Nov 27 '23 15:11 bbuchfink

oh wait...when i tried to use diamond blast to blast against a previously downloaded uniprot database, it says Opening the database... Error: This executable was not compiled with support for BLAST databases. does that mean i need to do some tuning before using diamond blast? Is diamond makedb --taxonmap prot.accession2taxid.FULL.gz --taxonnodes nodes.dmp --taxonnames names.dmp --in uniprot_sprot.fasta --db uniprotDB the sample code for that? and do u know if it would be alright to use ncbi nr protein accession id, taxonnodes and taxonmap info for making uniprot_sprot db?as it seems i just cant find the uniprot taxon info from their website... Thanks again!

PS:i am using the most updated version of diamond

Nov 28 '23 05:11 terancehhwong

This executable was not compiled with support for BLAST databases

means you have a BLAST database of that name in your directory

diamond makedb --taxonmap prot.accession2taxid.FULL.gz --taxonnodes nodes.dmp --taxonnames names.dmp --in uniprot_sprot.fasta --db uniprotDB

You can build a database from a fasta file like this. The NCBI mapping files may work for this swissprot fasta file, but I'm not sure.

Nov 29 '23 16:11 bbuchfink

I see. I also have a separate question: can we set the value for the length of the peptide when we perform blastp?Thanks

Dec 13 '23 10:12 terancehhwong

I'm not sure what you mean, the lengths of the sequences are stored in the input fasta files.

Dec 13 '23 14:12 bbuchfink

Oh I meant the alignment length. For example, for percent identity, u could set a certain threshold (with the parameter --id, eg --id 50) before performing blasting to only obtain the resulting unigenes that have at least 50percent identity with the sequences from the database in the blast output file (so by default, sequences with less than 50prercent identity will not be included in the output file). Therefore similarly, i wonder, if there are any parameters can be used to set the alignment length (ie the "length" listed in one of the 12preconfigured fields) to be above a certain threshold? Sorry for the confusion and thanks again

Dec 14 '23 06:12 terancehhwong

No there is no such setting.

Dec 19 '23 14:12 bbuchfink