Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning

Code for paper Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning. Please refer to our project page for more demonstrations and up-to-date related resources.

Updates

2023-10-09: We released our code.
2023-09-20: We release the paper and website of text2reward.

Dependencies

To establish the environment, run this code in the shell:

# set up conda
conda create -n text2reward python=3.7
conda activate text2reward
# set up ManiSkill2 environment
cd ManiSkill2
pip install -e .
pip install stable-baselines3==1.8.0 wandb tensorboard
cd ..
cd run_maniskill
bash download_data.sh
# set up MetaWorld environment
cd ..
cd Metaworld
pip install -e .
# set up code generation
pip install langchain chromadb==0.4.0

TroubleShooting

If you have not installed mujoco yet, please follow the instructions from here to install it. After that, please try the following commands to confirm the successful installation:

$ python3
>>> import mujoco_py

If you encounter the following errors when running ManiSkill2, we refer you to read the documents here.
- RuntimeError: vk::Instance::enumeratePhysicalDevices: ErrorInitializationFailed
- Some required Vulkan extension is not present. You may not use the renderer to render, however, CPU resources will be still available.
- Segmentation fault (core dumped)

Usage

Reimplement

To reimplement our experiment results, you can run the following scripts:

ManiSkill2:

bash run_oracle.sh
bash run_zero_shot.sh
bash run_few_shot.sh

It's normal to encounter the following warnings:

[svulkan2] [error] GLFW error: X11: The DISPLAY environment variable is missing
[svulkan2] [warning] Continue without GLFW.

MetaWorld:

bash run_oracle.sh
bash run_zero_shot.sh

Generate new reward code

Firstly please add the following environment variable to your .bashrc (or .zshrc, etc.).

export PYTHONPATH=$PYTHONPATH:~/path/to/text2reward

Then navigate to the directory text2reward/code_generation/single_flow and run the following scripts:

# generate reward code for Maniskill
bash run_maniskill_zeroshot.sh
bash run_maniskill_fewshot.sh
# generate reward code for MetaWorld
bash run_metaworld_zeroshot.sh

Run new experiment

By default, the run_oracle.sh script above uses the expert-written rewards provided by the environment; the run_zero_shot.sh and run_few_shot.sh scripts use the generated rewards used in our experiments. If you want to run a new experiment based on the reward you provide, just follow the bash script above and modify the --reward_path parameter to the path of your own reward.

Citation

If you find our work helpful, please cite us:

@article{text2reward,
  title={Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning},
  author={Xie, Tianbao and Zhao, Siheng and Wu, Chen Henry and Liu, Yitao and Luo, Qian and Zhong, Victor and Yang, Yanchao and Yu, Tao},
  journal={arXiv preprint arXiv:2309.11489},
  year={2023}
}

text2reward
text2reward copied to clipboard

Metadata

Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning

Updates

Dependencies

TroubleShooting

Usage

Reimplement

Generate new reward code

Run new experiment

Citation

Contributors

← Metadata

Owner

Metadata

text2reward text2reward copied to clipboard

Metadata

Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning

Updates

Dependencies

TroubleShooting

Usage

Reimplement

Generate new reward code

Run new experiment

Citation

Contributors

← Metadata

Owner

Metadata

text2reward
text2reward copied to clipboard