Ethan Caballero
Ethan Caballero
Did you try noising the the embedding layer(s) (e.g. fc1)? Did it have positive/negative effect?
Also, scaling the noise in proportion to variance it causes in outputs like in OpenAI version might help as well: https://arxiv.org/abs/1706.01905 ^look at paragraph titled "Adaptive Noise Scaling" in Section...
@dylanthomas did you try running Breakout-v0 for longer than 10M timesteps to see if avg reward eventually got to >400? For example, it took Muupan's A3C https://github.com/muupan/async-rl#a3c-ff 20M timesteps to...
did you update to v0.1.11 ? I noticed threads kept dropping until I upgraded to v0.1.11 which was released today.
Threads were dropping with version 0.1.10+2fd4d08. Threads no longer drop with version 0.1.11+8aa1cef
This might be more plausible with "Hindsight Experience Replay": https://arxiv.org/abs/1707.01495
This pyro DiCE implementation & use_case might be informative: https://github.com/uber/pyro/blob/684c909c7f66ced5408d4ea01dff9259d8b19bd2/pyro/infer/util.py#L109 https://github.com/uber/pyro/blob/ec8714f36de26d11c4d87155f68ba1e3d1868f2d/pyro/infer/traceenum_elbo.py#L17
I do that to make the grid search (scipy.optimize.brute) for b and d1 be spaced logarithmically.
Adding 1e-8 is to prevent b and d1 from being equal to zero. If d1 equals zero, you get a NaN because d1 is in a denominator. Now that I...