Adam Gleave issues

Results 32 issues of


                                            Adam Gleave

Add support for stateful policies

`rollout.PolicyCallable` takes an observation and outputs an action. This only supports stateless policies. By contrast, `BasePolicy.predict` takes an observation, mask (is it terminal observation) and state (which is reset when...

enhancement

Benchmark and replicate algorithm performance

Tune hyperparameters / match implementation details / fix bugs until we replicate the performance of reference implementations of algorithms. I'm not concerned about an exact match -- if we do...

Uncomment `getitem` type annotations once pytype bug fixed

We commented out some type annotations to workaround https://github.com/google/pytype/issues/1108 in https://github.com/HumanCompatibleAI/imitation/pull/393

Add script for MCEIRL Algorithm

The maximum causal entropy IRL algorithm is implemented in https://github.com/HumanCompatibleAI/imitation/blob/master/src/imitation/algorithms/mce_irl.py but there is not currently a training script for it in https://github.com/HumanCompatibleAI/imitation/tree/master/src/imitation/scripts

MCE IRL: add support for state-action rewards

Currently only state-based rewards are supported. Ideally we'd allow state-action based rewards as well. This would be easy from the RewardNet side, but would also require support to calculate state-action...

enhancement

Consistent interface for algorithms & single training script

Currently we don't have any CLI script for behavioral cloning or the density baseline. I envisage this codebase as being particularly useful in being able to rapidly benchmark against a...

enhancement

Make wrapped reward returns accessible à la Monitor

Currently `RewardVecEnvWrapper` replaces the reward directly, and internally keeps track of episode return that it logs using the `log_callback`. However, we often apply subsequent wrappers such as `VecNormalize` that change...

Debug flag affects agent input and behavior

Specifying the debug flag changes not just the visualization, but also the camera input from the agent; see attached screen capture. This substantially changes the behavior of agents that depend...

challenge

Checkpointing support with ray Tune

It would be nice to make `modelfree.hyperparams.train_rl` a tune.Trainable rather than a function, adding checkpointing support. This would let us use the HyperBand and Population Based Training schedulers. Conceptually this...

enhancement

Policy serializing

- [ ] Use new format for VecNormalize: https://github.com/hill-a/stable-baselines/pull/525 - [ ] Switch to context manager to ensure policies are closed? - [ ] Consider switching to `BasePolicy` rather than...