Steven H. Wang
Steven H. Wang
h/t @qxcv `AdversarialTrainer.train()` will repeatedly call `PPO.learn(total_timesteps=gen_batch_size, reset_num_timesteps=False)` where `gen_batch_size` is usually a small number compared to conventional RL training. Whether or not `reset_num_timestep=False`, `PPO` doesn't know the actual number...
https://github.com/HumanCompatibleAI/adversarial-policies/blob/3a273ea9b7a02c34f95917bb56c1473e9a1af3eb/src/modelfree/common/utils.py#L44 https://github.com/IDSIA/sacred/issues/498 was resolved.
The `pp.inference` API is a mess. All but two of the dozen or so functions available were used for experimentation/troubleshooting and are irrelevant for the paper, so it might be...
Episode reward summaries are all concentrated together on a few steps, with jumps in between. Zoomed out:  Zoomed in:  Every other summary looks fine:  To reproduce, run...
PPO2 uses a `with TensorboardWriter(...) as writer:` context that `flush`es but doesn't ever close its `tf.summary.FileWriter`. This led to (in combination with another problem on my side) a "too many...
`--gpu-ids=''` results in an parsing error, since the code will try to cast the empty string into an int.
`models/models.py` is looking for {FFG,MVG}Model, not {FFG,MVG}.