cherry
cherry copied to clipboard
New A2C example with entropy
Hi it is me again :sweat_smile:
Since I'm working with cherry these days I thought of sharing my implementation of A2C as the new actor_critic_cartpole.py
example.
My implementation is the same as the baselines one with the entropy loss and using the mean as reduction instead of the sum. It is divided in two classes (one for the A2C logic and one child class for the actual policy). I have also used only the declarative interface of pytorch (just becouse i like it more than the functional :smiley:)
I think that a different coding style with respect to the policy gradient example might be useful to have in the examples folder.
I have tested it and it should be safe to merge, but of course more benchmarking is still required (that is never enough :smiley:)
Edit: If you like it, wait before merging it. The rewards should not be normalized and I should run the environment just for some steps but even with those fixes I am getting some non coherent values for the value_loss
with respect to the ones from baselines.
I've just pushed 3 more commits, the changes to first one are:
- Multiple workers (as in A3C)
- RMSprop optimizer instead of Adam
- Avoid using the Runner Wrapper (i think that there is a bug when the environment is vectorized)
- No rewards normalization
I followed the paper Asynchronous Methods for Deep Reinforcement Learning and with these changes the implementation should be identical to the baselines one.
Thanks again for a nice PR @galatolofederico !
I would like to keep a copy of the original actor-critic example, as it's useful for newcomers to compare against the PyTorch examples implementation. What do you think about creating a folder examples/cartpole
and having the reinforce, actor-critic, and yours (actor_critic_declarative.py
?) in there ?
When you say it's identical to the baselines one, have you tried it on Atari or MuJoCo/PyBullet ? I've never got to thoroughly benchmarking our A2C implementation on Atari, and it doesn't work out-of-the-box for PyBullet. If yours did, it would be interesting to update/have dedicated scripts for that.
Finally, could you also add one line in CHANGELOG.md about this PR ? Thanks!
Sure it would be nice to have both versions! I am running some atari benchmarks right now, I should have the results in a couple of days. For now lets leave this PR open, I have made some improvements and I still have to push the changes.
Hi, I spent a couple of days benchmarking and testing my A2C implementation and I came to the conclusion that there are some bugs in cherry when using vectorized environments. I wrote some wrappers and a replay memory from scratch (with the same interface as cherry) and used my vectorized A2C implementation (slightly different that the one in this PR) and it works with my wrappers and replay memory but it does not with cherry. I do not have time right now but as soon as I will have some spare time I'll try to write a MWE. I will also implement other algorithms using my classes and release everything as open source code in the hope that it will help debugging cherry