ReinventCommunity icon indicating copy to clipboard operation
ReinventCommunity copied to clipboard

Output file scaffold_memory.csv does not contain all the generated molecules

Open GemaRG96 opened this issue 2 years ago • 1 comments

Hi!

I'm running REINVENT using the curriculum learning mode. Now I want to further analyse the molecules generated during the production phase but I cannot find them. As far as I understood, scaffold_memory.csv contains all the molecules collected during the production phase, but I only see 2 entries there, coming from 2 different epochs, when the production pahse run for ~20 epochs. In the logs generated I can see that, in each of these 20 epochs, more than 100 molecules were generated each time, and I also can see the scores they obtained when I look at the tensorboard report. Then, where can I find this molecules?

And an extra question if I may: I see 3 different agents in the results folder (Agent.100.ckpt, Agent.200.ckpt and Agent_merge_0.ckpt). Could you please clarify what is each of them and which one is the one we get after the production phase?

Some more info on my run:

  • I have only 1 curriculum objective, which was reached at epoch 189.
  • Then I got a "core dumped" error at epoch 211.
  • I'm not using diversity filter
  • I see different molecules when I look at the tensorboard results and at the logs outputed during the run, so they are not all the same 2 that I see in scaffold_memory.csv Let me know if you need any further info.

Thanks in advance!

GemaRG96 avatar Jul 27 '22 16:07 GemaRG96

Hi @GemaRG96,

Sorry for the delayed response. The scaffold memory contains all molecules collected during the production phase with the following 2 caveats:

  1. All curriculum objectives are satisfied and the agent proceeds to the production phase. Otherwise, if the curriculum objectives are not satisfied within the permitted # epochs, the run will stop. (Reading the extra info from your run, it seems your curriculum objective was satisfied so this point is only for your information, in case it is helpful in the future.)

  2. Only compounds that achieve a total score (the score obtained by combining all the Scoring function objectives together) above the threshold are stored. If you want to store every single compound generated, you should set the "minimum score" in the diversity filter to 0.

The "logging frequency" parameter controls how often the agent state is to be saved. Based on the names of the .ckpt models you provided, the logging frequency is 100. Therefore, Agent.100.ckpt is the agent state after 100 epochs of REINVENT. Agent.200.ckpt is the agent state after 200 epochs, etc. The Agent_merge_0.ckpt is the agent state at the moment the first curriculum objective has been satisfied.

GuoJeff avatar Aug 14 '22 15:08 GuoJeff