Steven H. Wang

Results 8 issues of Steven H. Wang

h/t @qxcv `AdversarialTrainer.train()` will repeatedly call `PPO.learn(total_timesteps=gen_batch_size, reset_num_timesteps=False)` where `gen_batch_size` is usually a small number compared to conventional RL training. Whether or not `reset_num_timestep=False`, `PPO` doesn't know the actual number...

https://github.com/HumanCompatibleAI/adversarial-policies/blob/3a273ea9b7a02c34f95917bb56c1473e9a1af3eb/src/modelfree/common/utils.py#L44 https://github.com/IDSIA/sacred/issues/498 was resolved.

The `pp.inference` API is a mess. All but two of the dozen or so functions available were used for experimentation/troubleshooting and are irrelevant for the paper, so it might be...

Episode reward summaries are all concentrated together on a few steps, with jumps in between. Zoomed out: ![image](https://user-images.githubusercontent.com/1750835/50369978-20aace00-0553-11e9-91a8-334ca4f405c4.png) Zoomed in: ![image](https://user-images.githubusercontent.com/1750835/50370046-6ddb6f80-0554-11e9-914f-8b2b8c45270f.png) Every other summary looks fine: ![image](https://user-images.githubusercontent.com/1750835/50370120-b9dae400-0555-11e9-98b6-92badee5c622.png) To reproduce, run...

bug

PPO2 uses a `with TensorboardWriter(...) as writer:` context that `flush`es but doesn't ever close its `tf.summary.FileWriter`. This led to (in combination with another problem on my side) a "too many...

bug
help wanted

`--gpu-ids=''` results in an parsing error, since the code will try to cast the empty string into an int.

`models/models.py` is looking for {FFG,MVG}Model, not {FFG,MVG}.