Ethan Caballero comments

Results 10 comments of


                                            Ethan Caballero

How to noise the LSTM?

Did you try noising the the embedding layer(s) (e.g. fc1)? Did it have positive/negative effect?

Also, scaling the noise in proportion to variance it causes in outputs like in OpenAI version might help as well: https://arxiv.org/abs/1706.01905 ^look at paragraph titled "Adaptive Noise Scaling" in Section...

expected a Variable arg but got numpy.ndarray error

@dylanthomas did you try running Breakout-v0 for longer than 10M timesteps to see if avg reward eventually got to >400? For example, it took Muupan's A3C https://github.com/muupan/async-rl#a3c-ff 20M timesteps to...

Sharing Pretrained Checkpoints

bump

Zbranch

did you update to v0.1.11 ? I noticed threads kept dropping until I upgraded to v0.1.11 which was released today.

Zbranch

Threads were dropping with version 0.1.10+2fd4d08. Threads no longer drop with version 0.1.11+8aa1cef

Idea for generating test cases

This might be more plausible with "Hindsight Experience Replay": https://arxiv.org/abs/1707.01495

Second-order score gradients

This pyro DiCE implementation & use_case might be informative: https://github.com/uber/pyro/blob/684c909c7f66ced5408d4ea01dff9259d8b19bd2/pyro/infer/util.py#L109 https://github.com/uber/pyro/blob/ec8714f36de26d11c4d87155f68ba1e3d1868f2d/pyro/infer/traceenum_elbo.py#L17

Transformation of b and d1

I do that to make the grid search (scipy.optimize.brute) for b and d1 be spaced logarithmically.

Transformation of b and d1

Adding 1e-8 is to prevent b and d1 from being equal to zero. If d1 equals zero, you get a NaN because d1 is in a denominator. Now that I...