imitation
imitation copied to clipboard
Clean PyTorch implementations of imitation and reward learning algorithms
Tune hyperparameters / match implementation details / fix bugs until we replicate the performance of reference implementations of algorithms. I'm not concerned about an exact match -- if we do...
We commented out some type annotations to workaround https://github.com/google/pytype/issues/1108 in https://github.com/HumanCompatibleAI/imitation/pull/393
The maximum causal entropy IRL algorithm is implemented in https://github.com/HumanCompatibleAI/imitation/blob/master/src/imitation/algorithms/mce_irl.py but there is not currently a training script for it in https://github.com/HumanCompatibleAI/imitation/tree/master/src/imitation/scripts
Currently only state-based rewards are supported. Ideally we'd allow state-action based rewards as well. This would be easy from the RewardNet side, but would also require support to calculate state-action...
Currently we don't have any CLI script for behavioral cloning or the density baseline. I envisage this codebase as being particularly useful in being able to rapidly benchmark against a...
At the moment, GAIL and BC don't interoperate well with SB3 in environments with image-based observation spaces. The main problem is the channels axis: many environments put channels last, but...
h/t @qxcv `AdversarialTrainer.train()` will repeatedly call `PPO.learn(total_timesteps=gen_batch_size, reset_num_timesteps=False)` where `gen_batch_size` is usually a small number compared to conventional RL training. Whether or not `reset_num_timestep=False`, `PPO` doesn't know the actual number...
Currently `RewardVecEnvWrapper` replaces the reward directly, and internally keeps track of episode return that it logs using the `log_callback`. However, we often apply subsequent wrappers such as `VecNormalize` that change...
## Description Fixes #560. ## Testing Ran all tests. Results: 3543 passed, 595 skipped, 4234 warnings.
## Bug description Attempting to load SB3 models from Huggingface in `serialize.py` often raises a `FileExistsError`, that tells us "Outdated policy format: we do not support restoring normalization statistics from...