Matthew Huang (Shao Ran) comments

Results 6 comments of


                                            Matthew Huang (Shao Ran)

[question] Questions about MlpLstmPolicy

Thanks a lot @Miffyli! Do you have any possible suggestions for the follow up question too? Thanks again!

[question] Questions about MlpLstmPolicy

Thanks a lot @Miffyli ! I will read about the paper and see if it helps! In addition, I am trying a very simple custom environment to test the LSTM...

[question] Questions about MlpLstmPolicy

I made the game easier, i.e. select the observation 2 steps ago, so that there is feedback to reach the solution (`reward = -np.abs(action - self.soln[self.step_count - 2])`). I also...

[question] Questions about MlpLstmPolicy

Or, more simply, does anyone have any example code of applying `MlpLstmPolicy` to custom environment that I may refer to? I feel that I must be missing something trivial...

Can an agent learn valid actions offline, being able to choose only actions that were already taken (e.g. from historical data) ? [question]

If I am not mistaken, one thing that might help is to start with Behavior Cloning, available in stable-baselines: https://stable-baselines.readthedocs.io/en/master/guide/pretrain.html#generate-expert-trajectories (though generating the expert trajectories may require more manual tweaks)?

Error When loading file

I encountered the same problem - I suspect that it may be due to the Windows version (the same code worked on a Windows 7 machine but not Windows 10...