Clodéric Mars
Clodéric Mars
- Convergence problem on PPO is fixed - Still a computation performance problem to be fixed - TODO -> - Multiple environment single agent - Multiple environment multiple agent
It's in #71
Needs to be rebased @saikrishna-1996 to do it.
- Forward part is working - TODO -> Mege request and test on other environments.
Maybe investigate https://hydra.cc
## What's done - Draft done - Refactoring on reinforce ongoing to clean up the code ## Todo - Finalize doc and do PR.
We should include methods that uses a continuous action space
- (Ongoing) Human driven exploration with µ0 Unplugged - IL / BC -> Basic Behavior Cloning / GAIL / [Primal Wasserstein Imitation Learning](https://openreview.net/forum?id=TtYSU29zgR) / [Domain-Robust Visual Imitation Learning with Mutual...
Thank you for your interest, we will take a look at your suggestion and update the documentation.
e.g. - Every N trials, - Archive the latest model version. - Run a specific set of trial on this version just for validation (no replay buffer update, no new...