EvoProtGrad icon indicating copy to clipboard operation
EvoProtGrad copied to clipboard

Using Masked Marginal Score for ESM-2 as a Scoring Method

Open Amelie-Schreiber opened this issue 1 year ago • 3 comments

In the paper Language models enable zero-shot prediction of the effects of mutations on protein function the ESM folks introduce the "Masked Marginal Scoring" method to compute effects of mutations on function and show that it performs significantly better than the Log Likelihood Ratio (LLR) method. If I am not mistaken, LLR is used for EvoProtGrad currently. Could the code from the ESM github (where they use ESM-1v) be adapted to ESM-2 and used in EvoProtGrad as a scoring method? In particular, could the masked marginal scoring method found here be modified to work with ESM-2 and used in EvoProtGrad as the scoring method? The masked marginal score is defined as

$$ \sum_{i \in M} \log p(x_i = x_i^{mt} | x_{-M}) - \log p(x_i = x_i^{wt} | x_{-M}) $$

in the paper above, in Appendix A at the bottom of page 18, where $-M$ denotes the sequence with masking at all positions in $M$, where mutations occur. That is they introduce masks at the mutated positions (all at once) and compute the score for a mutation by considering its probability relative to the wildtype amino acid. This might significantly improve the scoring and could be a nice alternative scoring strategy.

Amelie-Schreiber avatar Dec 02 '23 19:12 Amelie-Schreiber