imitation issues

Benchmark and replicate algorithm performance

8

Tune hyperparameters / match implementation details / fix bugs until we replicate the performance of reference implementations of algorithms. I'm not concerned about an exact match -- if we do...

AdamGleave

Uncomment `getitem` type annotations once pytype bug fixed

We commented out some type annotations to workaround https://github.com/google/pytype/issues/1108 in https://github.com/HumanCompatibleAI/imitation/pull/393

AdamGleave

Add script for MCEIRL Algorithm

The maximum causal entropy IRL algorithm is implemented in https://github.com/HumanCompatibleAI/imitation/blob/master/src/imitation/algorithms/mce_irl.py but there is not currently a training script for it in https://github.com/HumanCompatibleAI/imitation/tree/master/src/imitation/scripts

AdamGleave

MCE IRL: add support for state-action rewards

Currently only state-based rewards are supported. Ideally we'd allow state-action based rewards as well. This would be easy from the RewardNet side, but would also require support to calculate state-action...

AdamGleave

enhancement

Consistent interface for algorithms & single training script

17

Currently we don't have any CLI script for behavioral cloning or the density baseline. I envisage this codebase as being particularly useful in being able to rapidly benchmark against a...

AdamGleave

enhancement

Support image-based observation spaces in same way as SB3

5

At the moment, GAIL and BC don't interoperate well with SB3 in environments with image-based observation spaces. The main problem is the channels axis: many environments put channels last, but...

qxcv

AdversarialTrainer: Silent incompatibility with SB3 learning rate schedules

2

h/t @qxcv `AdversarialTrainer.train()` will repeatedly call `PPO.learn(total_timesteps=gen_batch_size, reset_num_timesteps=False)` where `gen_batch_size` is usually a small number compared to conventional RL training. Whether or not `reset_num_timestep=False`, `PPO` doesn't know the actual number...

shwang

Make wrapped reward returns accessible à la Monitor

Currently `RewardVecEnvWrapper` replaces the reward directly, and internally keeps track of episode return that it logs using the `log_callback`. However, we often apply subsequent wrappers such as `VecNormalize` that change...

AdamGleave

Remove awscli dependency

5

## Description Fixes #560. ## Testing Ran all tests. Results: 3543 passed, 595 skipped, 4234 warnings.

dfilan

HuggingFace models outdated

8

## Bug description Attempting to load SB3 models from Huggingface in `serialize.py` often raises a `FileExistsError`, that tells us "Outdated policy format: we do not support restoring normalization statistics from...

dfilan

bug

imitation
imitation copied to clipboard

Metadata

Benchmark and replicate algorithm performance

Uncomment `getitem` type annotations once pytype bug fixed

Add script for MCEIRL Algorithm

MCE IRL: add support for state-action rewards

Consistent interface for algorithms & single training script

Support image-based observation spaces in same way as SB3

AdversarialTrainer: Silent incompatibility with SB3 learning rate schedules

Make wrapped reward returns accessible à la Monitor

Remove awscli dependency

HuggingFace models outdated

← Metadata

Owner

Metadata

imitation imitation copied to clipboard

Metadata

← Metadata

Owner

Metadata

imitation
imitation copied to clipboard