RaptorX-3DModeling
RaptorX-3DModeling copied to clipboard
Meff Calculation for a3m fromat
Hi!
I am confused about how your code calculates Meff in the a3m format when 'gaps aligned to inserts' are omitted. Specifically, it appears that the code treats matches (uppercase characters) and inserts (lowercase characters) in the same manner, and this results in a higher Meff value for the file.
To illustrate the issue, consider the following example using two sequences in the a3m format:
Sequence 1: HCTTKFCDYKAAGAEEYAQQEVVKRSYGKAFKLSISALFVTPKTAGAQVV
Sequence 2: HCTTKFCDYgKAAGAEEYAQQEVVKRSYGKAFKLSISALFVTPKTAGAQVV
In position 10 of the second sequence, there is a lowercase 'g'. Not adding a 'gap aligned to insert' in the corresponding position of the first sequence causes all subsequent residues to shift to the right and this results in considering these shifted residues as dissimilar, which are in fact the same. As a result, the number of dissimilarities increases, leading to an inflated Meff value for the MSA file.
Could you kindly explain the rationale behind this?