Adam Gleave issues

Results 32 issues of


                                            Adam Gleave

PyPi release of 0.8.3?

There's a GitHub release on 0.8.3 but it doesn't seem to have made its way to PyPi yet. @Qwlouse @thequilo are there any plans to make a PyPi release soon?...

Memory caching calls slow for NumPy arguments

When using Memory.cache, Joblib writes out a metadata file for each cached function call including its input argument. While Joblib uses an efficient NumPy pickler for *output* arrays, it calls...

Regression: `@overload` function with `TypeVar` is omitted from generated `pyi` file

`pytype` version `2021.5.11` introduced a regression that is still present in the latest version `2022.01.13`: overloaded type annotations that involve a type variable are omitted from the generated `pyi` file....

bug

cat: generics

Fix test PyPI automation

Currently we have a GitHub workflow that uploads to test PyPI on every commit; and to real PyPI on the release. The idea of the test PyPI automation is to...

Add back in `imitation.envs.examples` support to scripts

https://github.com/HumanCompatibleAI/imitation/pull/484 removes the import to `imitation.envs.examples` from `src/imitation/scripts/__init__.py` to workaround https://github.com/sphinx-doc/sphinx/issues/9069 Some options: 1. Move this example env code out of the repo. It was never a great fit for...

Remove Windows hack in setup.py after new SB3 release

Once https://github.com/DLR-RM/stable-baselines3/pull/979 is merged and makes it into next release, change `setup.py` to use the new SB3 version without checking whether or not running on Windows.

Expand algorithm documentation

The algorithm documentation is currently just converted directly from docstrings and is a method-by-method description, e.g. for [behavioral cloning](https://imitation.readthedocs.io/en/latest/algorithms/bc.html). It would be more user friendly to include some high-level description...

Write developer guide

Add a guide e.g. `docs/guide/developer.rst` summarizing the internals of `imitation` and guidelines for new developers building off [CONTRIBUTING.md](https://github.com/HumanCompatibleAI/imitation/blob/master/CONTRIBUTING.md). Examples of developer guides: [SB3](https://stable-baselines3.readthedocs.io/en/master/guide/developer.html), [NumPy](https://numpy.org/doc/stable/dev/) and [pandas](https://pandas.pydata.org/docs/development/index.html) (last two are more...

docs

[Preference Comparison] L2 regularization with dynamic regularization coefficient

The [DRLHP paper](https://arxiv.org/pdf/1706.03741.pdf) states in section 2.2.3 that: > A fraction of 1/e of the data is held out to be used as a validation set for each predictor. We...

enhancement

[Preference Comparison] Active learning from ensemble

The original [deep RL from human preferences paper](https://arxiv.org/pdf/1706.03741.pdf) uses an ensemble of reward models. It then selects queries for comparison that have the highest disagreement between models, a proxy for...