minimalRL
minimalRL copied to clipboard
Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)
Hi, first of all, thanks for your awesome codes. This is not about any technical issue, but about the algorithm of the DDPG code. As far as I know, the...
if you train ppo far enough likes 3000 episodes or more, rewards got dropped. (like 500 to 30)
It shows better performance and good readability than the current version.
[pytorch-lightning](https://pytorch-lightning.readthedocs.io) allows for [less boilerplate](https://pytorch-lightning.readthedocs.io/en/latest/introduction_guide.html#why-pytorch-lightning) and [more optimization](https://pytorch-lightning.readthedocs.io/en/latest/performance.html). Maybe it should be used to allow for easier reuse of the code.
I may suggest you to implement TD3 for your algorithm 9. [TD3](https://arxiv.org/abs/1802.09477) Great job so far :)
As from my understanding the policy network is giving an output of mean and variance for a single action. After that torch.gather is used to calculate the log_prob. Can someone...
Hey there, Would it be wise to include entropy factor in your ppo implementation? How to do that? Second question I have is why do you not use 0.5*MSE Loss...
env.reset() returns more value, that is "info" env.step() returns more value, that is "truncated"
Readme:Every algorithm can be trained within 30 seconds, even without GPU?it's False  The two places marked in the picture stopped for a long time, and dqn training did not...