minimalRL issues

A naive question about updating parameters in DDPG.

Hi, first of all, thanks for your awesome codes. This is not about any technical issue, but about the algorithm of the DDPG code. As far as I know, the...

HiddenBeginner

cartpole ppo train , reward drop

1

if you train ppo far enough likes 3000 episodes or more, rewards got dropped. (like 500 to 30)

SeungyounShin

improve continuous-ppo

It shows better performance and good readability than the current version.

seolhokim

Use pytorch-lightning for better readability and optimization

[pytorch-lightning](https://pytorch-lightning.readthedocs.io) allows for [less boilerplate](https://pytorch-lightning.readthedocs.io/en/latest/introduction_guide.html#why-pytorch-lightning) and [more optimization](https://pytorch-lightning.readthedocs.io/en/latest/performance.html). Maybe it should be used to allow for easier reuse of the code.

EmmanuelMess

TD3: Twin Delayed DDPG

2

I may suggest you to implement TD3 for your algorithm 9. [TD3](https://arxiv.org/abs/1802.09477) Great job so far :)

zcaicaros

torch.gather in relevant to policy gradient

As from my understanding the policy network is giving an output of mean and variance for a single action. After that torch.gather is used to calculate the log_prob. Can someone...

migom6

PPO has no entropy factor

Hey there, Would it be wise to include entropy factor in your ppo implementation? How to do that? Second question I have is why do you not use 0.5*MSE Loss...

CesMak

fixes syntax

1

env.reset() returns more value, that is "info" env.step() returns more value, that is "truncated"

lazybuttrying

Training speed is very slow！！！

1

Readme：Every algorithm can be trained within 30 seconds, even without GPU？it's False ![image](https://github.com/seungeunrho/minimalRL/assets/76723484/47326408-b525-4c93-a766-9a6426789e14) The two places marked in the picture stopped for a long time, and dqn training did not...

xuzhou666

minimalRL
minimalRL copied to clipboard

Metadata

A naive question about updating parameters in DDPG.

cartpole ppo train , reward drop

ppo minibatch version

improve continuous-ppo

Use pytorch-lightning for better readability and optimization

TD3: Twin Delayed DDPG

torch.gather in relevant to policy gradient

PPO has no entropy factor

fixes syntax

Training speed is very slow！！！

← Metadata

Owner

Metadata

minimalRL minimalRL copied to clipboard

Metadata

← Metadata

Owner

Metadata

minimalRL
minimalRL copied to clipboard