CCMgen icon indicating copy to clipboard operation
CCMgen copied to clipboard

Model with 2 states for amino acids ?

Open odannis opened this issue 3 years ago • 1 comments

Hi,

I would like to infer a model on a generated data set where "proteins" have only 2 states available for each site. For example :

MSA = [ [ 1, 0, 1, ... , 0, 1, 0],
         ......,
        [0, 0, 1, ...., 1, 1, 0]] 

Could it be possible to infer a model with only 2 states for each amino acid? Furthermore, could I use CCMgen for generating data with only 2 states based on the field inferred?

Thank you for your help

odannis avatar Jul 11 '21 14:07 odannis

Hi @odannis,

sorry for the long wait - I have been busy lately. Generally speaking the two-state MRF is a special case of the standard 20 state mrf, where all the singleton potentials corresponding to the 18 "forbidden" states are set to -infinity.

By default training should make the singleton potentials small, given that there's no prior on the singletons. You need to be careful with pseudocounts, though! I would not add any, just to be sure.

If the "forbidden" amino acids still pop up during simulation, you could also modify the parameters manually and set the singleton potentials corresponding to the unwanted states to a large negative number.

croth1 avatar Jul 16 '21 16:07 croth1