comparison of peptide groups
Hello
I am trying to compare different sets of peptides (8-50aa) using one of them as a reference. My interest is to detect the peptides that are common between the different groups and the group used as reference. So, I use the following parameters: --id 100 --query-cover 100 --subject-cover 100 --max-target-seqs 1. What I noticed is that when I compare the reference group with it, not all peptides are reported in the results. The reference group has 43000 peptides, so I would expect 43000 peptides to be reported when comparing the reference group to it, but 36807 sequences were reported.
The command I used was
reference vs. reference
diamond blastp
--query ref_db.fasta
--db ref_db.dmnd
--out ref_vs_ref.tsv
--outfmt 6 qseqid sseqid pident qcovhsp scovhsp
--id 100 --query-cover 100 --subject-cover 100 \
--max-target-seqs 1 --ultra-sensitive --masking 0 --threads 8
I also tried this command, but I still get less than 43000 peptides.
reference vs. reference
diamond blastp\
--query ref_db.fasta\
--db ref_db.dmnd
--out ref_vs.ref.tsv
--outfmt 6 qseqid sseqid pident qcovhsp scovhsp
--id 100 --query-cover 100 --subject-cover 100 \
--max-target-seqs 1 --ultra-sensitive --masking 0 --threads 8 --shape-mask 11111
What can I do?
Best, Osmel
There are various other things that can prevent short alignments from being reported, including the evalue (-e), hamming distance filter (--id2), ungapped filter (--ungapped-evalue-short and --ungapped-evalue), gapped filter (--gapped-filter-evalue).