DD_protocol icon indicating copy to clipboard operation
DD_protocol copied to clipboard

what happens to the best model after each iteration?

Open mgm-14392 opened this issue 1 year ago • 1 comments

Hello! If I understand correctly, at the end of each iteration the best model is selected and used to predict the hit-likeness of all molecules in the library. Afterwards, if I use the recommended parameters reported in the article, I can train another 24 models with different hyperparameters. Why can't we simply take the best model from the first iteration and continue to train it for another N iterations? What happenes to the best models after each iteration? Am I missing something? Thank you!

mgm-14392 avatar Mar 21 '24 13:03 mgm-14392

I may be misunderstanding the question, but if I do get it: to oversimplify, you more or less do exactly this. After going from interation N to iteration N+1, you acquire additional training data and again preform hyperparameter search in model training to select for the best model in inference.

So iteration 1 has X training data, Y validation data, Z testing data. In iteration 2, all data collected via docking augments the training data, and this repeats for iteration 3, 4, etc.

Since each iteration acquires more data, the optimal hyperparameters may change, and thus the model is retrained. In my experience, even with datasets of ~10M molecules, model training is very fast. There are methods that do this general step in a different manner, not proceeding in an "iterative" fashion, but I am not very familiar with them. Hope that helps :)

DKchemistry avatar Aug 28 '25 14:08 DKchemistry