Matthew Huang (Shao Ran)
Matthew Huang (Shao Ran)
Thanks a lot @Miffyli! Do you have any possible suggestions for the follow up question too? Thanks again!
Thanks a lot @Miffyli ! I will read about the paper and see if it helps! In addition, I am trying a very simple custom environment to test the LSTM...
I made the game easier, i.e. select the observation 2 steps ago, so that there is feedback to reach the solution (`reward = -np.abs(action - self.soln[self.step_count - 2])`). I also...
Or, more simply, does anyone have any example code of applying `MlpLstmPolicy` to custom environment that I may refer to? I feel that I must be missing something trivial...
If I am not mistaken, one thing that might help is to start with Behavior Cloning, available in stable-baselines: https://stable-baselines.readthedocs.io/en/master/guide/pretrain.html#generate-expert-trajectories (though generating the expert trajectories may require more manual tweaks)?
I encountered the same problem - I suspect that it may be due to the Windows version (the same code worked on a Windows 7 machine but not Windows 10...