text2reward
text2reward copied to clipboard
[ICLR 2024] Code for the paper "Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning"
Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning
Code for paper Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning. Please refer to our project page for more demonstrations and up-to-date related resources.
Updates
Dependencies
To establish the environment, run this code in the shell:
# set up conda
conda create -n text2reward python=3.7
conda activate text2reward
# set up ManiSkill2 environment
cd ManiSkill2
pip install -e .
pip install stable-baselines3==1.8.0 wandb tensorboard
cd ..
cd run_maniskill
bash download_data.sh
# set up MetaWorld environment
cd ..
cd Metaworld
pip install -e .
# set up code generation
pip install langchain chromadb==0.4.0
TroubleShooting
- If you have not installed
mujoco
yet, please follow the instructions from here to install it. After that, please try the following commands to confirm the successful installation:
$ python3
>>> import mujoco_py
- If you encounter the following errors when running ManiSkill2, we refer you to read the documents here.
-
RuntimeError: vk::Instance::enumeratePhysicalDevices: ErrorInitializationFailed
-
Some required Vulkan extension is not present. You may not use the renderer to render, however, CPU resources will be still available.
-
Segmentation fault (core dumped)
-
Usage
Reimplement
To reimplement our experiment results, you can run the following scripts:
ManiSkill2:
bash run_oracle.sh
bash run_zero_shot.sh
bash run_few_shot.sh
It's normal to encounter the following warnings:
[svulkan2] [error] GLFW error: X11: The DISPLAY environment variable is missing
[svulkan2] [warning] Continue without GLFW.
MetaWorld:
bash run_oracle.sh
bash run_zero_shot.sh
Generate new reward code
Firstly please add the following environment variable to your .bashrc
(or .zshrc
, etc.).
export PYTHONPATH=$PYTHONPATH:~/path/to/text2reward
Then navigate to the directory text2reward/code_generation/single_flow
and run the following scripts:
# generate reward code for Maniskill
bash run_maniskill_zeroshot.sh
bash run_maniskill_fewshot.sh
# generate reward code for MetaWorld
bash run_metaworld_zeroshot.sh
Run new experiment
By default, the run_oracle.sh
script above uses the expert-written rewards provided by the environment; the run_zero_shot.sh
and run_few_shot.sh
scripts use the generated rewards used in our experiments. If you want to run a new experiment based on the reward you provide, just follow the bash script above and modify the --reward_path
parameter to the path of your own reward.
Citation
If you find our work helpful, please cite us:
@article{text2reward,
title={Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning},
author={Xie, Tianbao and Zhao, Siheng and Wu, Chen Henry and Liu, Yitao and Luo, Qian and Zhong, Victor and Yang, Yanchao and Yu, Tao},
journal={arXiv preprint arXiv:2309.11489},
year={2023}
}