Costa Huang comments

Results 256 comments of


                                            Costa Huang

Proper entropy regularized PPO

Hey @51616 thanks for reporting this issue. This looks very interesting. Would you mind preparing a PR implementing this change? It would be best if you could also make a...

Proper entropy regularized PPO

@51616 Thanks for reporting back. This is great to know. Have you tracked the experiments by any chance? I'd love to see the learning curves and stuff. If you have...

Proper entropy regularized PPO

@51616 that’s a good point. I’d be careful with committing this change directly to the files. Maybe you can create a folder called `ppo_wth_proper_enropy_tuning` and create the three ppo files...

Proper entropy regularized PPO

Hey thanks for running these tests. So I think the best way to proceed from this is to make a PR and put everything under a folder like `ppo_wth_proper_enropy_tuning`, and...

Work with AWS Preemptible Instance

https://github.com/vwxyzjn/gym-microrts/commit/f50994d49459158f47b9150acdf3ed00d204c7af roughly shows the required changes, which involves about 20 lines of code change. As a demo, try the following command: ``` # to start python ppo_autoregressive.py \ --wandb-project-name gym-microrts...

Episode start signal not used in RNN for on-policy algorithms

Maybe @araffin is saying the states could be reset multiple times during training, depending on the number of dones in the data collection?

Episode start signal not used in RNN for on-policy algorithms

Right, but you should apply the same principle during training. So a quick way to see if you have implemented it correctly is to print out the ratio. If during...

Episode start signal not used in RNN for on-policy algorithms

> Do you mean in the sample trajectory, there would be done=True in the middle? yes. > Actually this is never happening in the current implementation This seems unusual to...

Episode start signal not used in RNN for on-policy algorithms

> yes... yeah… This might cause some problems. > I'll do the refactor in the following week. if you are looking for a simpler yet authentic reference than openai/baselines, checkout...

A question: LSTM + PPO

Hey @tesla-cat, coincidentally we have been building OpenAI Five style bots in [gym-microrts](https://github.com/vwxyzjn/gym-microrts), which looks like ![](https://github.com/vwxyzjn/gym-microrts/raw/master/static/fullgame.gif) We have successfully prototyped [PPO + LSTM in the Atari games](https://github.com/vwxyzjn/cleanrl/pull/83), and we...