Matthew Aitchison

Results 10 comments of Matthew Aitchison

Small thing > Here's the episodic rewards after 200M environment steps (50M gradient updates), compared to Fig. 6 in the original paper: should be > Here's the episodic rewards after...

@jseppanen This latest change looks great. It seems, however, that I made a typo in the paper. The entropy bonus should be 1e-2 (0.01) not 0.001 as I have mistakenly...

> I think using the built-in normalization is a good idea. There are a few advantages with your old method, though, and that is that you have one set of...

> The large number of paralell agents helps a lot with dealing the gradient noise. Parallel agents are better than longer n_steps due to the n_steps producing temporally correlated trajectories....

> Code looks good, matches details mentioned in the paper. Really like the idea that just using a different lambda for advantage estimation gives such a huge boost overall. >...

> OK so some news: > > I reverted the hyperparameter change (commit [0fe7b1f](https://github.com/vwxyzjn/cleanrl/commit/0fe7b1f1b4d846cc6e61e5ddb1547e5664bbb6e3)) and the learning stage ordering change (commit [e50bf9e](https://github.com/vwxyzjn/cleanrl/commit/e50bf9e220360e3fc049316c1dbaa825f15b9e4b)). The reason is that I did some investigation...

This might be a TensorFlow version thing. I'm on V1.14 an also having this problem. I tried switching the policy from RNN to CNN and it seems work, but I...

Hi @vwxyzjn, I now have some time I can put into this and would be happy to finish off the last few things that need doing. Looks like @jseppanen has...

I made a fork where I think I've fixed that by unrolling the loop which can be found here https://github.com/TheCacophonyProject/VAD I also made some small other changes to get it...

Python would be great! For the moment I submitted a PR with just the Octave change. It does slow things down a bit though, so I only run the Octave...