multiobj-rationale
multiobj-rationale copied to clipboard
In which part does it incorporate RL?
It's nice work! However I have a question. Since I'm not so familiar with Reinforce Learning, I wonder which part of it has RL? In 3.3.2 fine-tuning, "Update the model P(G,S) on the fine-tuning set $D^f$ using policy gradient method" It seems that it uses RL here. However, in the code, it just compute the topo, atom and bond type loss between the expanded $S_i$ and $G_i^k$. Thanks!
Allow me to comment. I don't think you will find algorithms like the REINFORCE here. What Jin meant is improving p(G|S) using desire chemical properties. This happens in the property_filter() method in finetune.py.
I also went looking for the RL and found none.
For posterity (happy to be corrected by the authors), what the finetuning step (and property_filter
) seems to consist in is to generate $Nm$ datapoints from $m$ rationales, filter those points to keep only those with the desired properties, and take a step of maximum likelihood on those remaining points.
This might be a totally valid thing to do, but it is indeed far from policy gradient. In the strictest sense, you could interpret this as a baseline-less REINFORCE with $R=0$ for all the rejected points (which would make their gradient 0) and $R=1$ for the points that property_filter
keeps. In practice this is not a recommended RL setup, as it has all sorts of instabilities.