Daniel Filan
Daniel Filan
(apologies for the bad branch name) Includes: - A new class of CNN reward functions - A reward function wrapper to ensure inputs are always channel-first - Modifications to normalization...
### 🚀 Feature When the reward function is trained by the `CrossEntropyRewardTrainer` in `preference_comparisons.py`, it currently takes a batch of trajectory fragments and preferences, calculates its reward for each pair...
## Description Fixes #560. ## Testing Ran all tests. Results: 3543 passed, 595 skipped, 4234 warnings.
## Bug description Attempting to load SB3 models from Huggingface in `serialize.py` often raises a `FileExistsError`, that tells us "Outdated policy format: we do not support restoring normalization statistics from...
## Description Add config options to train CNNs on image environments. This also involves letting configs set environment wrappers, so that environments like Atari can be appropriately wrapped. ## Testing...
## Bug description When installing dependencies in master, seemingly every version of `awscli` is being installed. See [here](https://imgur.com/a/x8JkQt1) for illustration. ## Steps to reproduce Run the following: ``` git clone...
## Bug description The `save` function in `data/types.py` saves sequences of trajectories in numpy's `.npz` format. That said, some scripts save these with the name `final.pkl` or similar, making it...
## Problem In `algorithms/dagger.py`, there is a function called `_save_dagger_demo` that saves trajectories in `.npz` format. This is redundant with the `save` function in `data/types.py` which also saves trajectories in...
Loosens up requirements, allowing some newer versions of things. NB: I tried to follow the advice of running `cd test && pytest --gpu` but got an "unrecognized arguments" error: ```...
Currently, the code requires an old version of pytorch: 1.8.1. Upgrading pytorch to 1.12.1 doesn't break any tests, so the requirement should probably be loosened.