choreographer
choreographer copied to clipboard
[ICLR 2023] Choreographer: a model-based agent that discovers and learns unsupervised skills in latent imagination, and it's able to efficiently coordinate and adapt the skills to solve downstream tas...
Choreographer: Learning and Adapting Skills in Imagination
This is the code for Choreographer: a model-based agent that discovers and learns unsupervised skills in latent imagination, and it's able to efficiently coordinate and adapt the skills to solve downstream tasks.
If you find the code useful, please refer to our work using:
@inproceedings{
Mazzaglia2023Choreographer,
title={Choreographer: Learning and Adapting Skills in Imagination},
author={Pietro Mazzaglia and Tim Verbelen and Bart Dhoedt and Alexandre Lacoste and Sai Rajeswar},
booktitle={International Conference on Learning Representations},
year={2023},
url={https://openreview.net/forum?id=PhkWyijGi5b}
}
Requirements
We assume you have access to a GPU that can run CUDA 10.2 and CUDNN 8. Then, the simplest way to install all required dependencies is to create an anaconda environment by running
conda env create -f conda_env.yml
After the instalation ends you can activate your environment with
conda activate choreo
Implemented Agents
Agent | Command |
---|---|
Choreographer | agent=choreo |
Available Domains
We support the following domains.
Domain | Tasks |
---|---|
walker |
stand , walk , run , flip |
quadruped |
walk , run , stand , jump |
jaco |
reach_top_left , reach_top_right , reach_bottom_left , reach_bottom_right |
mw |
reach |
Instructions
Offline Datasets
The datasets from the ExORL paper can be downloaded on their repository. The default loading directory is:
~/urlb_datasets/${dataset}/${domain}/${collection_method}/buffer
For example:
~/urlb_datasets/exorl/walker/rnd/buffer
The dataset folder can be changed in the offline_train.yaml
file, replacing the value at the key dataset_dir
or it can be provided when launching the offline pre-training, as a command line argument.
For example:
python offline_train.py ... dataset_dir=$MYPATH/exorl/walker/rnd/buffer
Offline Pre-training
To run pre-training from offline data use the offline_train.py
script
python offline_train.py configs=dmc_states agent=choreo dataset=exorl collection_method=rnd domain=walker seed=1
This script will produce several agent snapshots after training for 10k
, 50k
, 100k
, and 200k
update steps. The snapshots will be stored under the following directory:
./offline_models/${dataset}/${collection_method}/${domain}/${agent.name}/${seed}
For example:
./offline_models/exorl/rnd/walker/choreo/1
Parallel to exploration Pre-training
To run pre-training in parallel to (LBS) exploration use the pretrain.py
script
python pretrain.py configs=dmc_pixels agent=choreo domain=walker
This script will produce several agent snapshots after training for 100k
, 500k
, 1M
, and 2M
frames. The snapshots will be stored under the following directory:
./pretrained_models/${obs_type}/${domain}/${agent.name}/${seed}
For example:
./pretrained_models/pixels/walker/choreo/1
Fine-tuning
Once you have pre-trained your agent, you can fine-tune it on a downstream task.
From offline dataset
For example, let's say you have pre-trained Choreographer on the walker
domain ExORL data, you can fine-tune it on walker_run
by running the following command:
python finetune.py configs=dmc_states agent=choreo task=walker_run from_offline=True dataset=exorl collection_method=rnd snapshot_ts=200000 seed=1
This will load a snapshot stored in ./offline_models/exorl/rnd/walker/choreo/1/snapshot_200000.pt
, initialize Choreographer
's models and skill policies with it, and start training on walker_run
using the meta-controller to maximize the extrinsic rewards of the task.
From parallel exploration
For example, let's say you have pre-trained Choreographer on the walker
domain, you can fine-tune it on walker_run
by running the following command:
python finetune.py configs=dmc_pixels agent=choreo task=walker_run snapshot_ts=2000000 seed=1
This will load a snapshot stored in ./pretrained_models/pixels/walker/choreo/1/snapshot_2000000.pt
, initialize Choreographer
's models and skill policies with it, and start training on walker_run
using the meta-controller to maximize the extrinsic rewards of the task.
Zero-shot
Jaco
To perform zero-shot evaluation on Jaco, ensure that you pre-trained the model in the pixel settings and that is correctly stored under pretrained_models
. Then, you can run:
python finetune.py configs=dmc_pixels task=$JACO_TASK snapshot_ts=2000000 num_train_frames=10 num_eval_episodes=100 eval_goals=True
where $JACO_TASK is one of the Jaco reach task.
Meta-World
To pre-train on the Meta-World reach environent you can run the following command:
python pretrain.py configs=mw_pixels domain=mw`
To perform zero-shot evaluation on Meta-World, ensure that you pre-trained the model and that is correctly stored under pretrained_models
. Then, you can run:
python finetune.py configs=mw_pixels task=mw_reach snapshot_ts=2000000 task_id=$TASK_ID num_train_frames=10 num_eval_episodes=100 eval_goals=True agent.update_skill_every_step=50
where $TASK_ID is a value in range(0,50)
. The goals are stored under mw_tasks/reach_harder
. There are 10 sets of goals, which can be used by setting the evaluation seed, e.g. seed=0
.
Monitoring
Tensorboard
Logs are stored in the exp_local
folder. To launch tensorboard run:
tensorboard --logdir exp_local
The console output is also available in a form:
| train | F: 6000 | S: 3000 | E: 6 | L: 1000 | R: 5.5177 | FPS: 96.7586 | T: 0:00:42
a training entry decodes as
F : total number of environment frames
S : total number of agent steps
E : total number of episodes
R : episode return
FPS: training throughput (frames per second)
T : total training time
Weights and Bias (wandb)
You can also use Weights and Bias, by launching the experiments with use_wandb=True
.
Acknowledgments
The environment implementations come from URLB. The model implementation is inspired by DreamerV2. We also thank the authors of MetaWorld.