ColabDesign
ColabDesign copied to clipboard
Early stopping
When using model.design_3stage()
, I observe a decrease in RMSE and at some point the error increases again with each additional iteration. I suspect the last iteration is used as design result, but is there a way to use the "best" iteration? Thanks for your help.
The best result is saved during the 3rd stage (when we switch to one_hot). For some very difficult targets, there might not exist a one_hot solution with < 0.5 rmsd... in which case the rmsd will go up.
or... I need to improve the optimizer, Can you share the example you were trying? :D (if you prefer to keep it private, and you don't mind sharing with me, send here: [email protected])
I tried
model = mk_design_model(protocol="fixbb", model_mode="sample", model_parallel=True, num_models=5, num_recycles=3, recycle_mode="sample")
model.prep_inputs(pdb_filename=get_pdb("2MN5"), chain="A")
Which should not be too difficult a protein. It went down to rmsd 2 before climbing back to 3ish -- it could be I am using wrong settings? Alternatively, the PDB entry for 2MN5 has 20 models, so I am not sure what happens when we only pass chain="A"?
Thank you for your help!
Wow... This does look like a difficult target! You need 6 disulfides to hold it together:
Unconstrained design looks like this (maybe if we increase number of iterations at stage 1, it may find a solution...)
model = mk_design_model(protocol="fixbb")
model.prep_inputs(pdb_filename=get_pdb("2MN5"), chain="A")
model.restart()
model.design_3stage()
If you fix certain positions to "C". It's quickly able to find a solution within 2 RMSD and retain it:
seq_init = "XCXXXXXXCXXXXXXXXXXXCXXCXXXXXXCXXXXXXXCXXXXXXXCXCXXXCXXC"
model.restart(seq_init=seq_init, add_seq=True)
model.design_3stage()
Maybe increasing number of recycles and number of model params sampled per iteration might help.
PS, the sequence constraints are implemented in the beta version that can be found here: https://github.com/sokrypton/ColabDesign/tree/beta/af