Future improvements
First, hands down, amazing work. Serving as a baseline, I see a possible improvement, if someone wants to implement it:
- The n-step return, as it is, is biased (as you are using old off-policy samples). Retrace [Safe and Efficient Off-Policy Reinforcement Learning] would resolve the issue. However, implementing Retrace in Distributional RL is not straightforward, but I see that work [The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning] deals with the issue (as it seems, without the quantile regression, however).
Thanks a lot, I can say the same for your blog 🙂In my spare time I'm still looking for bugs, as I still have issues with some learning on some games (Pong and Breakout being particularly worrying).
In any case I won't be able to work on extending Rainbow for a while, but if anyone is interested I've leaving master for a pure Rainbow implementation and extensions for anything others want to add as options (I added quantile regression as an exercise to myself, but I haven't tested it within Rainbow at all, so it's possible that it's harmful here).
Edit: False alarm on Pong at least
Hello,
I am reopening this really old issue to just ask a question. Did you test the QR extension on other game than Pong? (and even better did you try to implement Implicit Quantile Networks? ^^)
Nope - I just did that as an implementation exercise to myself, so I've not actually tested it at all (the comment about Pong was about normal Rainbow at the time). I'm not planning to do any further development, but I am trying to test normal Rainbow on a few more games/upload the pretrained models for people to use.
I also noice that the Rainbow performs veyr bad in Pong , extremly strange.