Adam Gleave
Adam Gleave
There's a GitHub release on 0.8.3 but it doesn't seem to have made its way to PyPi yet. @Qwlouse @thequilo are there any plans to make a PyPi release soon?...
When using Memory.cache, Joblib writes out a metadata file for each cached function call including its input argument. While Joblib uses an efficient NumPy pickler for *output* arrays, it calls...
`pytype` version `2021.5.11` introduced a regression that is still present in the latest version `2022.01.13`: overloaded type annotations that involve a type variable are omitted from the generated `pyi` file....
Currently we have a GitHub workflow that uploads to test PyPI on every commit; and to real PyPI on the release. The idea of the test PyPI automation is to...
https://github.com/HumanCompatibleAI/imitation/pull/484 removes the import to `imitation.envs.examples` from `src/imitation/scripts/__init__.py` to workaround https://github.com/sphinx-doc/sphinx/issues/9069 Some options: 1. Move this example env code out of the repo. It was never a great fit for...
Once https://github.com/DLR-RM/stable-baselines3/pull/979 is merged and makes it into next release, change `setup.py` to use the new SB3 version without checking whether or not running on Windows.
The algorithm documentation is currently just converted directly from docstrings and is a method-by-method description, e.g. for [behavioral cloning](https://imitation.readthedocs.io/en/latest/algorithms/bc.html). It would be more user friendly to include some high-level description...
Add a guide e.g. `docs/guide/developer.rst` summarizing the internals of `imitation` and guidelines for new developers building off [CONTRIBUTING.md](https://github.com/HumanCompatibleAI/imitation/blob/master/CONTRIBUTING.md). Examples of developer guides: [SB3](https://stable-baselines3.readthedocs.io/en/master/guide/developer.html), [NumPy](https://numpy.org/doc/stable/dev/) and [pandas](https://pandas.pydata.org/docs/development/index.html) (last two are more...
The [DRLHP paper](https://arxiv.org/pdf/1706.03741.pdf) states in section 2.2.3 that: > A fraction of 1/e of the data is held out to be used as a validation set for each predictor. We...
The original [deep RL from human preferences paper](https://arxiv.org/pdf/1706.03741.pdf) uses an ensemble of reward models. It then selects queries for comparison that have the highest disagreement between models, a proxy for...