RaptorX-3DModeling icon indicating copy to clipboard operation
RaptorX-3DModeling copied to clipboard

Meff Calculation for a3m fromat

Open Maryam-Haghani opened this issue 1 year ago • 0 comments

Hi!

I am confused about how your code calculates Meff in the a3m format when 'gaps aligned to inserts' are omitted. Specifically, it appears that the code treats matches (uppercase characters) and inserts (lowercase characters) in the same manner, and this results in a higher Meff value for the file.

To illustrate the issue, consider the following example using two sequences in the a3m format:

Sequence 1: HCTTKFCDYKAAGAEEYAQQEVVKRSYGKAFKLSISALFVTPKTAGAQVV
Sequence 2: HCTTKFCDYgKAAGAEEYAQQEVVKRSYGKAFKLSISALFVTPKTAGAQVV

In position 10 of the second sequence, there is a lowercase 'g'. Not adding a 'gap aligned to insert' in the corresponding position of the first sequence causes all subsequent residues to shift to the right and this results in considering these shifted residues as dissimilar, which are in fact the same. As a result, the number of dissimilarities increases, leading to an inflated Meff value for the MSA file.

Could you kindly explain the rationale behind this?

Maryam-Haghani avatar Jul 13 '23 21:07 Maryam-Haghani