diamond icon indicating copy to clipboard operation
diamond copied to clipboard

comparison of peptide groups

Open ofleitas opened this issue 8 months ago • 1 comments

Hello

I am trying to compare different sets of peptides (8-50aa) using one of them as a reference. My interest is to detect the peptides that are common between the different groups and the group used as reference. So, I use the following parameters: --id 100 --query-cover 100 --subject-cover 100 --max-target-seqs 1. What I noticed is that when I compare the reference group with it, not all peptides are reported in the results. The reference group has 43000 peptides, so I would expect 43000 peptides to be reported when comparing the reference group to it, but 36807 sequences were reported.

The command I used was

reference vs. reference

diamond blastp
--query ref_db.fasta
--db ref_db.dmnd
--out ref_vs_ref.tsv
--outfmt 6 qseqid sseqid pident qcovhsp scovhsp
--id 100 --query-cover 100 --subject-cover 100 \ --max-target-seqs 1 --ultra-sensitive --masking 0 --threads 8

I also tried this command, but I still get less than 43000 peptides.

reference vs. reference

diamond blastp\ --query ref_db.fasta\ --db ref_db.dmnd
--out ref_vs.ref.tsv
--outfmt 6 qseqid sseqid pident qcovhsp scovhsp
--id 100 --query-cover 100 --subject-cover 100 \ --max-target-seqs 1 --ultra-sensitive --masking 0 --threads 8 --shape-mask 11111

What can I do?

Best, Osmel

ofleitas avatar Apr 01 '25 14:04 ofleitas

There are various other things that can prevent short alignments from being reported, including the evalue (-e), hamming distance filter (--id2), ungapped filter (--ungapped-evalue-short and --ungapped-evalue), gapped filter (--gapped-filter-evalue).

bbuchfink avatar May 28 '25 10:05 bbuchfink