Inconsistent Percentage of Identity Calculations with 'X' Characters in DIAMOND BLASTP Output
When running the following command:
root@8a6cc3e3fbcb:/opt# diamond blastp -q test.fa -d test.dmnd -o out.tsv --more-sensitive --outfmt 6 qseqid sseqid full_qseq full_sseq pident qlen slen length gapopen --matrix PAM30 --no-self-hits --masking 0
produced the output:
sequence_0_counts_2260484 sequence_1_counts_226 XXXXSFFPILSYYSMSIYPSYGYTYXXXXXXXXXSHYGVWYGAM XXXQSFFPILSYYSMSIYPSYGYTYXXXXXXXXXSHYGVWYGAM 100 44 44 40 0 sequence_1_counts_226 sequence_0_counts_2260484 XXXQSFFPILSYYSMSIYPSYGYTYXXXXXXXXXSHYGVWYGAM XXXXSFFPILSYYSMSIYPSYGYTYXXXXXXXXXSHYGVWYGAM 97.6 44 44 41 0
Is there a way to take into account X characters when calculating percentage identity and why the identities for these two queries are diffefrent?
Thank you, Nemanja
Is there a way to take into account X characters when calculating percentage identity
no
why the identities for these two queries are diffefrent?
the sequences are different