Clodéric Mars comments

Results 23 comments of


                                            Clodéric Mars

PPO implementation with continuous action space

- Convergence problem on PPO is fixed - Still a computation performance problem to be fixed - TODO -> - Multiple environment single agent - Multiple environment multiple agent

Streamline the process orchestration to provide clear entry point for algorithm implementation

It's in #71

Add a transition screen between trials

Needs to be rebased @saikrishna-1996 to do it.

Integrate HuggingFace / SB3 agents

- Forward part is working - TODO -> Mege request and test on other environments.

Split the `run_params.yaml`

Maybe investigate https://hydra.cc

Add documentation for how to write your own agent / agent adapter

## What's done - Draft done - Refactoring on reinforce ongoing to clean up the code ## Todo - Finalize doc and do PR.

Implement further HILL techniques

We should include methods that uses a continuous action space

Implement further HILL techniques

- (Ongoing) Human driven exploration with µ0 Unplugged - IL / BC -> Basic Behavior Cloning / GAIL / [Primal Wasserstein Imitation Learning](https://openreview.net/forum?id=TtYSU29zgR) / [Domain-Robust Visual Imitation Learning with Mutual...

Use viztracer in a preferred way

Thank you for your interest, we will take a look at your suggestion and update the documentation.

Showcase how to do validation in a run implementation

e.g. - Every N trials, - Archive the latest model version. - Run a specific set of trial on this version just for validation (no replay buffer update, no new...