Costa Huang
Costa Huang
Hey @51616 thanks for reporting this issue. This looks very interesting. Would you mind preparing a PR implementing this change? It would be best if you could also make a...
@51616 Thanks for reporting back. This is great to know. Have you tracked the experiments by any chance? I'd love to see the learning curves and stuff. If you have...
@51616 that’s a good point. I’d be careful with committing this change directly to the files. Maybe you can create a folder called `ppo_wth_proper_enropy_tuning` and create the three ppo files...
Hey thanks for running these tests. So I think the best way to proceed from this is to make a PR and put everything under a folder like `ppo_wth_proper_enropy_tuning`, and...
https://github.com/vwxyzjn/gym-microrts/commit/f50994d49459158f47b9150acdf3ed00d204c7af roughly shows the required changes, which involves about 20 lines of code change. As a demo, try the following command: ``` # to start python ppo_autoregressive.py \ --wandb-project-name gym-microrts...
Maybe @araffin is saying the states could be reset multiple times during training, depending on the number of dones in the data collection?
Right, but you should apply the same principle during training. So a quick way to see if you have implemented it correctly is to print out the ratio. If during...
> Do you mean in the sample trajectory, there would be done=True in the middle? yes. > Actually this is never happening in the current implementation This seems unusual to...
> yes... yeah… This might cause some problems. > I'll do the refactor in the following week. if you are looking for a simpler yet authentic reference than openai/baselines, checkout...
Hey @tesla-cat, coincidentally we have been building OpenAI Five style bots in [gym-microrts](https://github.com/vwxyzjn/gym-microrts), which looks like  We have successfully prototyped [PPO + LSTM in the Atari games](https://github.com/vwxyzjn/cleanrl/pull/83), and we...