replay-based-recurrent-rl icon indicating copy to clipboard operation
replay-based-recurrent-rl copied to clipboard

Code for "Task-Agnostic Continual RL: In Praise of a Simple Baseline"

Task-Agnostic Continual RL: In Praise of a Simple Baseline

Table of Contents
  1. About The Project
    • Built With
  2. Structure
  3. Installation
  4. Usage
    • Paper Reproduction
  5. Contributing
  6. License
  7. Contact
  8. Acknowledgments

About The Project

Official codebase for the paper Task-Agnostic Continual Reinforcement Learning: Gaining Insights and Overcoming Challenges. The code can be useful to run continual RL (or multi-task RL) experiments in Meta-World (e.g. in Continual-World) as well as large-scale study in the synthetic benchmark Quadratic Optimization. The baselines, including replay-based recurrent RL (3RL) and ER-TX, are modular enough to be ported into another codebase as well.

If you have found the paper or codebase useful, consider citing the work:

@article{caccia2022task,
  title={Task-Agnostic Continual Reinforcement Learning: Gaining Insights and Overcoming Challenges},
  author={Caccia, Massimo and Mueller, Jonas and Kim, Taesup and Charlin, Laurent and Fakoor, Rasool},
  journal={arXiv preprint arXiv:2205.14495},
  year={2022}
}

(back to top)

Built With

(back to top)

Structure

├── code
    ├── algs/SAC
        ├── sac.py                   # training SAC   
    ├── configs
        ├── hparams                  # hyperparameter config files
        ├── methods                  # methods' config files
        ├── settings                 # settings config files
    ├── misc
        ├── buffer.py                # sampling data from buffer
        ├── runner_offpolicy.py      # agent sampling data from the env
        ├── sequoia_envs.py          # creating the environemnts
        ├── utils.py
    ├── models
        ├── networks.py              # creates the neural networks
    ├── scripts
        ├── test_codebase.py         # makes sures the repo runs correctly
    ├── train_and_eval
        ├── eval.py                  # evaluation logig
        ├── train_cl.py              # training logic in CL
        ├── train_mtl.py             # training logic in MTL
    ├── main.py                      # main file for CRL experiments
├── public_ck                  
    ├── ...                          # checkpoints in CW10 benchmark

(back to top)

Installation

Essentially what you need is

  • python (3.8)
  • sequoia
  • mujoco_py
  • pytorch

It can be quite tricky to install mujoco_py, as well as running Meta-World. For this reason, we've used Sequoia to create the continual reinforcement learning environments.

Here's how you can install the dependencies in MacOS (BigSur)

  1. create a env, ideally w/ conda
conda create -n tacrl  python=3.8
conda activate tacrl
  1. install Sequoia w/ Meta-World add-on
pip install "sequoia[metaworld] @ git+https://www.github.com/lebrice/Sequoia.git@pass_seed_to_metaworld_envs"
  1. extra requirements
pip install -r requirements.txt
  1. install mujoco + mujoco key You will need to install MuJoCo (version >= 2.1)

UPDATE: I haven't reinstalled Mujoco since DeepMind acquisition and refactoring. Best of luck w/ the installation.

You can follow RoboSuite installation if you stumble on some GCC related bugs (MacOS specific).

For GCC / GL/glew.h related errors, you can use the instructions here

Contact us if you have any problem!

(back to top)

Usage

example of running SAC w/ an RNN in CW10

python code/main.py --train_mode cl --context_rnn True --setting_config CW10 --lr 0.0003 --batch_size 1028

or w/ a transformer in Quadratic Optimization in a multi-task regime

python code/main.py --train_mode mtl --context_transformer True --env_name Quadratic_opt --lr 0.0003 --batch_size 128

You can pass config files and reproduce the paper's results by combining a setting, method and hyperparameters config file in the following manner

python code/main.py --train_mode <cl, mtl> --setting_config <setting> --method_config <method> --hparam_config <hyperparameters>

e.g. running 3RL in CW10 w/ the hyperparameter prescribed by Meta-World (for Meta-world v2):

python code/main.py --train_mode cl --setting_config CW10 --method_config 3RL --hparam_config meta_world

For the MTRL experiments, run

python code/main.py --train_mode mtl

for prototyping, you can use the ant_direction environment:

python code/main.py --env_name Ant_direction-v3

Note: If you get an error about "yaml.safe_load", replace it with "yaml.load()".

Paper reproduction

For access to the WandB project w/ the results, please contact me.

For Figure 1, use analyse_reps.py (the models are in public_ck)

For all synthetic data experiments, you can create a wandb sweep

wandb sweep --project Quadratic_opt code/configs/sweeps/Quadratic_opt.yaml

And then launch the wandb agent

wandb agent <sweep_id>

For Figure 5

python main.py --train_mode cl --setting_config <CW10, CL_MW20_500k>  --method_config <method> --hparam_config meta_world

For Figure 7

python main.py --train_mode mtl --setting_config MTL_MW20  --method_config <method> --hparam_config meta_world-noet

(back to top)

License

This project is licensed under the Apache-2.0 License.

Contact

Please open an issue on [issues tracker](https://github.com/amazon-research/ replay-based-recurrent-rl /issues) to report problems or to ask questions or send an email to Massimo Caccia - @MassCaccia and Rasool Fakoor.

(back to top)