imitation issues

Use CSV of latest results for benchmarking

See https://github.com/HumanCompatibleAI/imitation/pull/657 - replace hardcoded paper results with a CSV of results that we can compare and update Also, include instructions for updating the CSV, and a comparison between paper...

timbauman

Change adversarial algorithms to collect rollouts first

2

## Description This PR changes the adversarial algorithm such that the at any iteration, the rollouts are collected first, and then the discriminator is trained, followed by training the generator....

taufeeque9

Refactor trajectory handling

I think we should refactor the way we handle demonstrations inside of `imitation`. Skimming over the code it looks like we spend way too much LOC on supporting and converting...

ernestum

enhancement

Discriminator output for Fu's AIRL paper is wrong

1

## Problem The discriminator in AIRL here is just a regular one, not corresponding to the one in Fu's paper which can deal with robust dynamics. ## Solution As stated...

Dormiveglia-elf

enhancement

Approximate a finite horizon environment?

Hello 👋 This is a question, not a feature request - I hope that's alright. I understand that this repo doesn't support infinite horizon episodes. The gridworld environment I want...

kierad

enhancement

Confused that the process of "Trains the generator to maximize the discriminator loss"

2

## Problem Hi, the `imitation` is a great project! Currently, I am training GAIL algorithm, and the learner network is PPO in SB3. I have questions about the training process...

Liuzy0908

enhancement

Support human preferences in “Deep RL from human preferences” (RLHP) implementation

4

Our team [KABasalt](https://github.com/BASALT-2022-Karlsruhe) participated in last year's BASALT competition and we noticed that RLHP currently lacks support for human preferences. ## Problem: Only synchronous, synthetic preferences gathering is supported by...

mschweizer

enhancement

DAgger Refactoring

## Problem The current DAgger implementation is a split into a `DAggerTrainer` and a `SimpleDAggerTrainer`. The split being mostly arbitrary. Also the dependency on BC is far too deep. ##...

ernestum

enhancement

save videos during training

2

## Description This addresses [Issue #523]( https://github.com/HumanCompatibleAI/imitation/issues#:~:text=Add%20support%20for%20saving%20videos%20of%20policies%20on%20a%20environment%20for%20evaluation%20during%20and%20after%20training) to automatically save videos during training time. This builds off of the following, earlier [PR](https://github.com/HumanCompatibleAI/imitation/pull/524/files#diff-cc891c802ce6c8a2e1fc96fc67e50e08a5e7f3158f6b35cd41d783b0744b26dd). Known Limitations: (1) Will not necessarily save a...

samuelarnesen

Embed code examples in docs

3

## Problem https://github.com/HumanCompatibleAI/imitation/pull/603#discussion_r1011673467 > I guess if we had code examples that are embedded directly from other files this would (a) solve the issue of testing docs separately, (b) solve...

Rocamonde

enhancement

imitation
imitation copied to clipboard

Metadata

Use CSV of latest results for benchmarking

Change adversarial algorithms to collect rollouts first

Refactor trajectory handling

Discriminator output for Fu's AIRL paper is wrong

Approximate a finite horizon environment?

Confused that the process of "Trains the generator to maximize the discriminator loss"

Support human preferences in “Deep RL from human preferences” (RLHP) implementation

DAgger Refactoring

save videos during training

Embed code examples in docs

← Metadata

Owner

Metadata

imitation imitation copied to clipboard

Metadata

← Metadata

Owner

Metadata

imitation
imitation copied to clipboard