sequicity
sequicity copied to clipboard
About Reinforcement Learning
First of all, thanks for your open-source code of this wonderful work. I also have some questions about your code of reinforcement learning. I found that in your version of reinforcement learning, you use the training dataset for policy gradient to fine-tuning parameters. But actually, in my opinion, a user simulator should be used as the environment for updating the parameters in RL setup. Can you tell me the reason? Thank you very much !