GDPO icon indicating copy to clipboard operation
GDPO copied to clipboard

Clarification on the DS Score Normalization

Open sujinyun999 opened this issue 11 months ago • 0 comments

Hi.

I hope this message finds you well

In paper 5.2, it is explained (following prior research and as shown in the figure) that the DS score is normalized by dividing it by 20. However, in the code scorer/evaluate.py—specifically in the gen_score_list function (lines 65 to 74)—it appears that the DS score is normalized by dividing it by 10. Could you clarify which DS Score normalization approach is intended?

Best, Sujin

Image

def gen_score_list(protein,smiles,train_fps=None,weight_list=None):

...
    df = df[~df[protein].isin([-1])]
    dsscore = np.clip(df[protein],0,20)/10
    novelscore=1-df["sim"]
    df['qed'] = get_scores('qed', df['mol'])
    qedscore = np.array(df["qed"])
    df['sa'] = get_scores('sa', df['mol'])
    sascore = np.array(df["sa"])
    if weight_list is None:
        score_list = 0.1*qedscore+0.1*sascore+0.4*novelscore+0.4*dsscore
    else:
        score_list = weight_list[0]*qedscore+weight_list[1]*sascore+weight_list[2]*novelscore+weight_list[3]*dsscore

    valid_score_list = score_list.tolist()

sujinyun999 avatar Jan 25 '25 10:01 sujinyun999