Matthew Huang (Shao Ran)

Results 6 comments of Matthew Huang (Shao Ran)

Thanks a lot @Miffyli! Do you have any possible suggestions for the follow up question too? Thanks again!

Thanks a lot @Miffyli ! I will read about the paper and see if it helps! In addition, I am trying a very simple custom environment to test the LSTM...

I made the game easier, i.e. select the observation 2 steps ago, so that there is feedback to reach the solution (`reward = -np.abs(action - self.soln[self.step_count - 2])`). I also...

Or, more simply, does anyone have any example code of applying `MlpLstmPolicy` to custom environment that I may refer to? I feel that I must be missing something trivial...

If I am not mistaken, one thing that might help is to start with Behavior Cloning, available in stable-baselines: https://stable-baselines.readthedocs.io/en/master/guide/pretrain.html#generate-expert-trajectories (though generating the expert trajectories may require more manual tweaks)?

I encountered the same problem - I suspect that it may be due to the Windows version (the same code worked on a Windows 7 machine but not Windows 10...