progen icon indicating copy to clipboard operation
progen copied to clipboard

Predicting model for CM and MDH dataset

Open Tsinghua-gongjing opened this issue 1 year ago • 2 comments

Hi, thank you for the beautiful work.

Porgen has been applied to generate proteins for CM and MDH families. In the Method section, the details are described as:

We computed the AUC in receiver operating characteristic (ROC) curves for predicting binary function labels from model scores. We computed model scores for each sequence in both CM and MDH by using the per-token model log-likelihood in Eq. 2.

Does this mean: (1) for each sequence the log-likelihood is calculated for each token and (2) then a classifier model is employed to predict whether the whole sequence is reactive or not (the label is from experimental data). The features are the calculated log-likelihood score for each token. Could you please also release data/codes/models for this part?

Best regards

Tsinghua-gongjing avatar Apr 27 '23 07:04 Tsinghua-gongjing

Could you please release the corresponding data for this generation of CM/MDH?

Best, Liguo

donglg1309 avatar Jun 27 '23 03:06 donglg1309

I can not understand what is GB1 (top100avg) ? how to calculate?

image

wenyuhaokikika avatar Dec 27 '23 13:12 wenyuhaokikika