multiobj-rationale icon indicating copy to clipboard operation
multiobj-rationale copied to clipboard

In which part does it incorporate RL?

Open YifanDengWHU opened this issue 3 years ago • 2 comments

It's nice work! However I have a question. Since I'm not so familiar with Reinforce Learning, I wonder which part of it has RL? In 3.3.2 fine-tuning, "Update the model P(G,S) on the fine-tuning set $D^f$ using policy gradient method" It seems that it uses RL here. However, in the code, it just compute the topo, atom and bond type loss between the expanded $S_i$ and $G_i^k$. Thanks!

YifanDengWHU avatar Apr 01 '21 12:04 YifanDengWHU

Allow me to comment. I don't think you will find algorithms like the REINFORCE here. What Jin meant is improving p(G|S) using desire chemical properties. This happens in the property_filter() method in finetune.py.

bhomass avatar Nov 25 '21 06:11 bhomass

I also went looking for the RL and found none.

For posterity (happy to be corrected by the authors), what the finetuning step (and property_filter) seems to consist in is to generate $Nm$ datapoints from $m$ rationales, filter those points to keep only those with the desired properties, and take a step of maximum likelihood on those remaining points.

This might be a totally valid thing to do, but it is indeed far from policy gradient. In the strictest sense, you could interpret this as a baseline-less REINFORCE with $R=0$ for all the rejected points (which would make their gradient 0) and $R=1$ for the points that property_filter keeps. In practice this is not a recommended RL setup, as it has all sorts of instabilities.

bengioe avatar Aug 01 '22 19:08 bengioe