Matthew Aitchison comments

Results 10 comments of


                                            Matthew Aitchison

Implement PPO-DNA algorithm for Atari

Small thing > Here's the episodic rewards after 200M environment steps (50M gradient updates), compared to Fig. 6 in the original paper: should be > Here's the episodic rewards after...

Implement PPO-DNA algorithm for Atari

@jseppanen This latest change looks great. It seems, however, that I made a typo in the paper. The entropy bonus should be 1e-2 (0.01) not 0.001 as I have mistakenly...

Implement PPO-DNA algorithm for Atari

> I think using the built-in normalization is a good idea. There are a few advantages with your old method, though, and that is that you have one set of...

Implement PPO-DNA algorithm for Atari

> The large number of paralell agents helps a lot with dealing the gradient noise. Parallel agents are better than longer n_steps due to the n_steps producing temporally correlated trajectories....

Implement PPO-DNA algorithm for Atari

> Code looks good, matches details mentioned in the paper. Really like the idea that just using a different lambda for advantage estimation gives such a huge boost overall. >...

Implement PPO-DNA algorithm for Atari

> OK so some news: > > I reverted the hyperparameter change (commit [0fe7b1f](https://github.com/vwxyzjn/cleanrl/commit/0fe7b1f1b4d846cc6e61e5ddb1547e5664bbb6e3)) and the learning stage ordering change (commit [e50bf9e](https://github.com/vwxyzjn/cleanrl/commit/e50bf9e220360e3fc049316c1dbaa825f15b9e4b)). The reason is that I did some investigation...

Matthew Aitchison

Implement PPO-DNA algorithm for Atari

Implement PPO-DNA algorithm for Atari

Implement PPO-DNA algorithm for Atari

Implement PPO-DNA algorithm for Atari

Implement PPO-DNA algorithm for Atari

Implement PPO-DNA algorithm for Atari

bug report: Type error

Implement PPO-DNA algorithm for Atari

Can Octave be used?

Can Octave be used?