Inconsistent Percentage of Identity Calculations with 'X' Characters in DIAMOND BLASTP Output

Open nvucic opened this issue 1 year ago • 1 comments

When running the following command:

root@8a6cc3e3fbcb:/opt# diamond blastp -q test.fa -d test.dmnd -o out.tsv --more-sensitive --outfmt 6 qseqid sseqid full_qseq full_sseq pident qlen slen length gapopen --matrix PAM30 --no-self-hits --masking 0

produced the output:

sequence_0_counts_2260484 sequence_1_counts_226 XXXXSFFPILSYYSMSIYPSYGYTYXXXXXXXXXSHYGVWYGAM XXXQSFFPILSYYSMSIYPSYGYTYXXXXXXXXXSHYGVWYGAM 100 44 44 40 0 sequence_1_counts_226 sequence_0_counts_2260484 XXXQSFFPILSYYSMSIYPSYGYTYXXXXXXXXXSHYGVWYGAM XXXXSFFPILSYYSMSIYPSYGYTYXXXXXXXXXSHYGVWYGAM 97.6 44 44 41 0

Is there a way to take into account X characters when calculating percentage identity and why the identities for these two queries are diffefrent?

Thank you, Nemanja

Feb 23 '24 21:02 nvucic

Is there a way to take into account X characters when calculating percentage identity

why the identities for these two queries are diffefrent?

the sequences are different

Mar 04 '24 14:03 bbuchfink