ColabDesign icon indicating copy to clipboard operation
ColabDesign copied to clipboard

Early stopping

Open phiweger opened this issue 2 years ago • 3 comments

When using model.design_3stage(), I observe a decrease in RMSE and at some point the error increases again with each additional iteration. I suspect the last iteration is used as design result, but is there a way to use the "best" iteration? Thanks for your help.

phiweger avatar Apr 01 '22 14:04 phiweger

The best result is saved during the 3rd stage (when we switch to one_hot). For some very difficult targets, there might not exist a one_hot solution with < 0.5 rmsd... in which case the rmsd will go up.

or... I need to improve the optimizer, Can you share the example you were trying? :D (if you prefer to keep it private, and you don't mind sharing with me, send here: [email protected])

sokrypton avatar Apr 04 '22 15:04 sokrypton

I tried

model = mk_design_model(protocol="fixbb", model_mode="sample", model_parallel=True, num_models=5, num_recycles=3, recycle_mode="sample")
model.prep_inputs(pdb_filename=get_pdb("2MN5"), chain="A")

Which should not be too difficult a protein. It went down to rmsd 2 before climbing back to 3ish -- it could be I am using wrong settings? Alternatively, the PDB entry for 2MN5 has 20 models, so I am not sure what happens when we only pass chain="A"?

Thank you for your help!

phiweger avatar Apr 04 '22 21:04 phiweger

Wow... This does look like a difficult target! You need 6 disulfides to hold it together: image

Unconstrained design looks like this (maybe if we increase number of iterations at stage 1, it may find a solution...)

model = mk_design_model(protocol="fixbb")
model.prep_inputs(pdb_filename=get_pdb("2MN5"), chain="A")
model.restart()
model.design_3stage()

image

If you fix certain positions to "C". It's quickly able to find a solution within 2 RMSD and retain it:

seq_init = "XCXXXXXXCXXXXXXXXXXXCXXCXXXXXXCXXXXXXXCXXXXXXXCXCXXXCXXC"
model.restart(seq_init=seq_init, add_seq=True)
model.design_3stage()

image

Maybe increasing number of recycles and number of model params sampled per iteration might help.

PS, the sequence constraints are implemented in the beta version that can be found here: https://github.com/sokrypton/ColabDesign/tree/beta/af

sokrypton avatar Apr 10 '22 16:04 sokrypton